SQL Server Slowly Changing Dimensions

Pre-requisite: Understand what a dimension in a datawarehouse means

Nothing in life is for permanent. The same applies to the underlying data at your data warehouse or data marts. In the following text I wish to highlight one of the brilliant aspects of data upserts (INSERT and/or UPDATE). The information provided will elaborate more about the Slowly Changing Dimension or commonly attributed as SCD and its types.

To begin with, there exist 6 different types of SCD implementation that are possible on any enterprise DW implementation. Though, it is rare to experience a SCD type beyond type 3.

Let’s take this in parts, so as to not burden you with lots and lots of information. In the part 1 of the series I will take Type 0, Type 1, Type 2 and Type 3.

In the follow up section, I plan to take Type 4, Type 6 and lastly the Combining type.

TypeDescription
Type 0Fixed. No history preservation
Type 1Overwrite old data with new data
Type 2Create new record by tuple versioning
Type 3Maintain change data in a separate column
Type 4Maintain change in a separate history table
Type 6Concept derived using Type 1,2 and 3
Combining TypeA combination of transformation on the underlying dimension data

Type 0

The dimension data is assumed to be fixed i.e. the data will never change. Any change data will be ignored and will not be validated.

Specific scenario:

Earlier:

Natural KeySales Person IDFirst NameLast NameState
S1234561234RaunakJhawarMH

Now:

Natural KeySales Person IDFirst NameLast NameState
S1234561234RaunakJhawarUP

If the ‘State’ column is following Type 0 SCD transformation, the value of State will continue to remain as ‘MH’, irrespective of its new value i.e. ‘UP’

In lighter terms, any updates are ignored and the data will new be versioned or updated.

Expert comment:  This approach for data management is not advisable. The data in the warehouse is bound to change over a period of time. The underlying DW implementation must be designed in a manner to accommodate any/all changes.

Type 1

The dimension data is fixed. But any or all changes will be validated and not ignored as in Type 0.

Specific scenario:

Earlier:

Natural KeySales Person IDFirst NameLast NameState
S1234561234RaunakJhawarMH

Now:

Natural KeySales Person IDFirst NameLast NameState
S1234561234RaunakJhawarUP

If the ‘State’ column is following Type 1 SCD transformation, the value of State will no longer continue to remain as ‘MH’, its new value will now be ‘UP’

In lighter terms, any updates are not ignored and the data will new be updated.

Expert comment:  This approach for data management is better than Type 0, but cannot be graded as the best approach. If you see the tabular representation, earlier the state had a value of ‘MH’ which was updated to ‘UP’. Once the column is updated it will appear to the user that the current value is the only value the data represented, which is not the case.

Type 2

Pre-requisite: Understand what a surrogate key means and its practical usability.

Surrogate Key: A random meaning-less integer value derived from the ‘Natural Key’. This key is used as a reference to make table joins between the dimension table and the fact table(s) of the OLAP system.

The dimension data is not updated as in Type 1, which forced to lose all history of the underlying data. Here, a new dimension row is inserted into the dimension table to facilitate the change. The natural key remains the same. A new surrogate key is created. An effective timestamp value is updated also, at times implementation can use a boolean flag value to indicate status of the row in study. The example which follows will help you to understand the Type 2 more clearly.

Specific scenario:

Earlier:

Natural KeyPerson IDFNameLNameStateSurrogate KeyFrom DateTo DateCurrent
S1234561234RaunakJhawarMH236821641-1-1990-Y

Now:

Natural KeyPerson IDFNameLNameStateSurrogate KeyFrom DateTo DateCurrent
S1234561234RaunakJhawarUP2738468724-1-199023-2-1990Y
S1234561234RaunakJhawarMH236821641-1-199023-1-1990N

Pros:

  • A thorough history of all changes made to the dimension entity is maintained in the dimension table with appropriate effective date and timestamps. 

Cons:

  • The dimension table can grow very large in size, for every change encountered; a new data row is created. Though this can be gracefully handled using the SCD Type 4 – History table method which will be shortly introduced to you in the following series.

Type 3

So far, you have learnt about the major SCD types and methods you can handle the upserts on them. Onwards, the topics are fairly advanced and require very specific circumstances to implement them. In most of the real life EDW implementations, you will rarely go beyond Type 2. So, I must congratulate you for learning a very important aspect of any DW implementation.

In Type 3 transformations, the good part is that you can track history change and save yourself by not creating all the additional data rows on your dimension table(s).

Now about the part, which I would call ‘not so good’ is that you will never be able to track all the changes made to the dimension entity. So let us understand more about the very unique Type 3 SCD technique.

Specific scenario:

Earlier:

Natural KeyPerson IDFNameLNameOriginal StateSurrogate KeyCurrent StateEffective Date
S1234561234RaunakJhawarMH23682164MH1-1-1900

Now:

Natural KeyPerson IDFNameLNameOriginal StateSurrogate KeyCurrent StateEffective Date
S1234561234RaunakJhawarMH23682164UP23-1-1900

Spot the change. The change is that no new data row is created; instead we have 2 additional columns here to rack the change. The ‘current state’ and ‘effective date’ column value will suffice the user to track the change.

Expert Comment: Type 3 is rarely encountered. You may implement when you are sure that the column will not change frequently.

Tune in for the remaining SCD Types in the follow up chapter. 

 

Regards

Raunak Jhawar

Like us on FaceBook Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook

Follow me on TwitterFollow me on FaceBook

SQL Server Replication Configuration – Part 1
SQL Server and SSIS Data Types