9/8/2011 7:11:08 AM
Raunak Jhawar -
Slowly Changing Dimension
Pre-requisite: Understand what a dimension in a datawarehouse means
Nothing in life is for permanent. The same applies to the underlying data at your data warehouse or data marts. In the following text I wish to highlight one of the brilliant aspects of data upserts (INSERT and/or UPDATE). The information provided will elaborate more about the Slowly Changing Dimension or commonly attributed as SCD and its types.
To begin with, there exist 6 different types of SCD implementation that are possible on any enterprise DW implementation. Though, it is rare to experience a SCD type beyond type 3.
Let’s take this in parts, so as to not burden you with lots and lots of information. In the part 1 of the series I will take Type 0, Type 1, Type 2 and Type 3.
In the follow up section, I plan to take Type 4, Type 6 and lastly the Combining type.
Fixed. No history preservation
Overwrite old data with new data
Create new record by tuple versioning
Maintain change data in a separate column
Maintain change in a separate history table
Concept derived using Type 1,2 and 3
A combination of transformation on the underlying dimension data
The dimension data is assumed to be fixed i.e. the data will never change. Any change data will be ignored and will not be validated.
Sales Person ID
If the ‘State’ column is following Type 0 SCD transformation, the value of State will continue to remain as ‘MH’, irrespective of its new value i.e. ‘UP’
In lighter terms, any updates are ignored and the data will new be versioned or updated.
Expert comment: This approach for data management is not advisable. The data in the warehouse is bound to change over a period of time. The underlying DW implementation must be designed in a manner to accommodate any/all changes.
The dimension data is fixed. But any or all changes will be validated and not ignored as in Type 0.
If the ‘State’ column is following Type 1 SCD transformation, the value of State will no longer continue to remain as ‘MH’, its new value will now be ‘UP’
In lighter terms, any updates are not ignored and the data will new be updated.
Expert comment: This approach for data management is better than Type 0, but cannot be graded as the best approach. If you see the tabular representation, earlier the state had a value of ‘MH’ which was updated to ‘UP’. Once the column is updated it will appear to the user that the current value is the only value the data represented, which is not the case.
Pre-requisite: Understand what a surrogate key means and its practical usability.
Surrogate Key: A random meaning-less integer value derived from the ‘Natural Key’. This key is used as a reference to make table joins between the dimension table and the fact table(s) of the OLAP system.
The dimension data is not updated as in Type 1, which forced to lose all history of the underlying data. Here, a new dimension row is inserted into the dimension table to facilitate the change. The natural key remains the same. A new surrogate key is created. An effective timestamp value is updated also, at times implementation can use a boolean flag value to indicate status of the row in study. The example which follows will help you to understand the Type 2 more clearly.
So far, you have learnt about the major SCD types and methods you can handle the upserts on them. Onwards, the topics are fairly advanced and require very specific circumstances to implement them. In most of the real life EDW implementations, you will rarely go beyond Type 2. So, I must congratulate you for learning a very important aspect of any DW implementation.
In Type 3 transformations, the good part is that you can track history change and save yourself by not creating all the additional data rows on your dimension table(s).
Now about the part, which I would call ‘not so good’ is that you will never be able to track all the changes made to the dimension entity. So let us understand more about the very unique Type 3 SCD technique.
Spot the change. The change is that no new data row is created; instead we have 2 additional columns here to rack the change. The ‘current state’ and ‘effective date’ column value will suffice the user to track the change.
Expert Comment: Type 3 is rarely encountered. You may implement when you are sure that the column will not change frequently.
Tune in for the remaining SCD Types in the follow up chapter.
Raunak Jhawar (Member since: 7/12/2011 6:42:24 AM)
Specilaist for Intensive Data Computing.
View Raunak Jhawar 's profile
Your SCD type 0 is wrong
Rest article is in sense
Right. Nice catch! Type mistake, though
Leave a comment