SQL Server Slowly Changing Dimensions

Pre-requisite: Understand what a dimension in a datawarehouse means

Nothing in life is for permanent. The same applies to the underlying data at your data warehouse or data marts. In the following text I wish to highlight one of the brilliant aspects of data upserts (INSERT and/or UPDATE). The information provided will elaborate more about the Slowly Changing Dimension or commonly attributed as SCD and its types.

To begin with, there exist 6 different types of SCD implementation that are possible on any enterprise DW implementation. Though, it is rare to experience a SCD type beyond type 3.

Let’s take this in parts, so as to not burden you with lots and lots of information. In the part 1 of the series I will take Type 0, Type 1, Type 2 and Type 3.

In the follow up section, I plan to take Type 4, Type 6 and lastly the Combining type.

Type Description
Type 0 Fixed. No history preservation
Type 1 Overwrite old data with new data
Type 2 Create new record by tuple versioning
Type 3 Maintain change data in a separate column
Type 4 Maintain change in a separate history table
Type 6 Concept derived using Type 1,2 and 3
Combining Type A combination of transformation on the underlying dimension data

Type 0

The dimension data is assumed to be fixed i.e. the data will never change. Any change data will be ignored and will not be validated.

Specific scenario:

Earlier:

Natural Key Sales Person ID First Name Last Name State
S123456 1234 Raunak Jhawar MH

Now:

Natural Key Sales Person ID First Name Last Name State
S123456 1234 Raunak Jhawar UP

If the ‘State’ column is following Type 0 SCD transformation, the value of State will continue to remain as ‘MH’, irrespective of its new value i.e. ‘UP’

In lighter terms, any updates are ignored and the data will new be versioned or updated.

Expert comment:  This approach for data management is not advisable. The data in the warehouse is bound to change over a period of time. The underlying DW implementation must be designed in a manner to accommodate any/all changes.

Type 1

The dimension data is fixed. But any or all changes will be validated and not ignored as in Type 0.

Specific scenario:

Earlier:

Natural Key Sales Person ID First Name Last Name State
S123456 1234 Raunak Jhawar MH

Now:

Natural Key Sales Person ID First Name Last Name State
S123456 1234 Raunak Jhawar UP

If the ‘State’ column is following Type 1 SCD transformation, the value of State will no longer continue to remain as ‘MH’, its new value will now be ‘UP’

In lighter terms, any updates are not ignored and the data will new be updated.

Expert comment:  This approach for data management is better than Type 0, but cannot be graded as the best approach. If you see the tabular representation, earlier the state had a value of ‘MH’ which was updated to ‘UP’. Once the column is updated it will appear to the user that the current value is the only value the data represented, which is not the case.

Type 2

Pre-requisite: Understand what a surrogate key means and its practical usability.

Surrogate Key: A random meaning-less integer value derived from the ‘Natural Key’. This key is used as a reference to make table joins between the dimension table and the fact table(s) of the OLAP system.

The dimension data is not updated as in Type 1, which forced to lose all history of the underlying data. Here, a new dimension row is inserted into the dimension table to facilitate the change. The natural key remains the same. A new surrogate key is created. An effective timestamp value is updated also, at times implementation can use a boolean flag value to indicate status of the row in study. The example which follows will help you to understand the Type 2 more clearly.

Specific scenario:

Earlier:

Natural Key Person ID FName LName State Surrogate Key From Date To Date Current
S123456 1234 Raunak Jhawar MH 23682164 1-1-1990 - Y

Now:

Natural Key Person ID FName LName State Surrogate Key From Date To Date Current
S123456 1234 Raunak Jhawar UP 27384687 24-1-1990 23-2-1990 Y
S123456 1234 Raunak Jhawar MH 23682164 1-1-1990 23-1-1990 N

Pros:

  • A thorough history of all changes made to the dimension entity is maintained in the dimension table with appropriate effective date and timestamps. 

Cons:

  • The dimension table can grow very large in size, for every change encountered; a new data row is created. Though this can be gracefully handled using the SCD Type 4 – History table method which will be shortly introduced to you in the following series.

Type 3

So far, you have learnt about the major SCD types and methods you can handle the upserts on them. Onwards, the topics are fairly advanced and require very specific circumstances to implement them. In most of the real life EDW implementations, you will rarely go beyond Type 2. So, I must congratulate you for learning a very important aspect of any DW implementation.

In Type 3 transformations, the good part is that you can track history change and save yourself by not creating all the additional data rows on your dimension table(s).

Now about the part, which I would call ‘not so good’ is that you will never be able to track all the changes made to the dimension entity. So let us understand more about the very unique Type 3 SCD technique.

Specific scenario:

Earlier:

Natural Key Person ID FName LName Original State Surrogate Key Current State Effective Date
S123456 1234 Raunak Jhawar MH 23682164 MH 1-1-1900

Now:

Natural Key Person ID FName LName Original State Surrogate Key Current State Effective Date
S123456 1234 Raunak Jhawar MH 23682164 UP 23-1-1900

Spot the change. The change is that no new data row is created; instead we have 2 additional columns here to rack the change. The ‘current state’ and ‘effective date’ column value will suffice the user to track the change.

Expert Comment: Type 3 is rarely encountered. You may implement when you are sure that the column will not change frequently.

Tune in for the remaining SCD Types in the follow up chapter. 

 

Regards

Raunak Jhawar

Like us on FaceBook Follow us on Twitter | Join the fastest growing SQL Server group on FaceBook

Follow me on TwitterFollow me on FaceBook

Learn SQL & related technologies from top-notch experts. 3 days. 30+ speakers. 70+ sessions. Join Asia’s first SQL Conference.

SQL Server Replication Configuration – Part 1
SQL Server and SSIS Data Types