Stars and Relations
I’ve been talking quite a lot about databases lately. So it should kinda give away a hint that I find this subject rather important… :’)
Because of that, one more post about databases. But this time something less technical but more theoretical. No no! Don’t run off! It will be quite okay.
There are different types of databases and setups of databases. Allow me to discuss the 2 commonly used.
The first one is the relational database. All the tables defined in this database are, what is called, normalized. This is the process to minimize the data’s redundancy. You want to store as less duplicated data as possible.
(For example you want to avoid storing names, addresses, etc, more than once in the database).
By doing so the available data will be split up in to multiple, smaller tables which are in some way linked to each other. This link (or join) is called the relation.
If you would draw a small relationtal database it would look as following (and once again, yes I leeched this of the internet, sorry!):
As you can see not all tables are linked to all tables. There are linked by an entire path of joins. This, by the way, looks very hard to read but it’s quite simple really. Once you know what the primary key and foreign key of the tables are it’s very easy.
Which brings me to the following:
Primary Key: a column which contains a unique value for the entire table.
Foreign Key: this is for example the primary key of table A which is used as a reference key in table B (remember: to be able to join). This key is unique in table A (because it’s the primary key) but doesn’t have to be unique in table B.
For example if you would like link department info to a person, you will most likely create a department_ID field in the person table.
Good, now, the relational model we were discussing. This type of model is typically used for OLTP (Online Transaction Processing) systems. These systems are built to insert data very quickly. CRM (Siebel, SAP, Salesforce, …) applications for example make use of this.
But ofcourse you do not always want to input data. It may also be handy to be able to extract data, in order to create a report for example.
This is moreover done in a datawarehouse. And a datawarehouse model looks quite different than a RDB.
This type of schema is called a Star schema… For obvious reasons :’)…
In this type of schema there are 2 main types of tables. Facts (per star you have 1 fact, the table you see in the middel of the schema) and Dimensions (all the other tables around the fact).
The difference between these tables is following… A dimension hold attribute information about a certain subject. Name, first name, address, brand, product, supplier, … .
You will never find anything ‘calculated’ here. That is why the fact is there. This table is the “highway” between dimensions. Every possible combination between the dimension linked to the fact is defined here. That is why facts are most of the time very large tables (don’t be surprised if you find fact tables with 20 million records).
If you want to join 2 dimensions together you will always need to make use of the fact table to link them.
A fact table can also contain some attributes (for performance reasons) but most of the time a fact table only contains calculated or aggregated information. Sales, # customers, # calls, # products, … .
This table always contains information about… surprise surprise … a fact. This can be as good as anything.
1 computer can be a fact, 100 boxes can be a fact as well. It depends on what you are planning to do or want to achief.
The data in a star schema is loaded the OLTP system. By the process of ETL (Extract Transform Load) the data is extracted from the source database (OLTP), transformed to a format you can use in the datawarehouse and loaded in the dimensions and facts.
This is called an OLAP database (Online Analytical Processing) and is build to extract data fast.
It seems that this is quite enough for today :)