The Most Demanding Concept in Data Engineering — Slowly Changing Dimensions (SCD)

SCD mastery fuels the Data Engineer’s success? Let’s Investigate!!

Kamireddy Mahendra
Towards AWS

--

A proficient Data Engineer possesses expertise in optimizing data storage methods to ensure data efficiency and enhance data utility for further applications.

Image Designed by author Kamireddy Mahendra.

As Data Engineer, we mostly work on large amounts of data to store process, and retrieve either historical or real-time data.

I recently started my services as a freelancer. You can contact me via my Upwork page or hire me for any assistance in content writing, Personal Training, and any data analytics-related work.

Click on the About Section to see more details about my freelance services.

Let’s dive into in detail about SCD → Slowly Changing Dimensions.

What is meant by slowly changing?

In this context Slowly Changing means the Changes happen in non-uniform intervals of time.

What are Dimensions?

Dimensions are nothing but the attributes or columns in any entity table.

Slowly Changing Dimensions(SCD)

Slowly Changing Dimensions (SCD) is nothing but the handling of dimensions that change slowly and at non-uniform intervals of time is known as slowly changing dimensions.

There are different ways in which we can store data in the data warehouse(SCD1,2,3,4,5,6). The most used methods are of 1,2 and 3 types of SCD. All these are based on the requirements or different use cases.

Significance:

  • To track Historical Data in a Data Warehouse, we generally use these techniques.
  • To ensure the accuracy and integrity of historical data, its analysis, and reporting we use these techniques.

In this article, we are going to look into the complexity of SCDs and explore their types.

Types of Slowly Changing Dimensions

There are several types of SCD, But three of them are the most used types of Slowly Changing Dimensions:

SCD1(Over Written with updated data)

  • In this type, changes to the attributes of a dimension table are not tracked. Instead, the old values are simply overwritten by the new ones.
  • This approach maintains simplicity, but it lacks historical data preservation and may lead to the loss of valuable information.

SCD2(Historical + updated Data)

  • This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys with different version numbers.
  • SCD2 involves creating a new record whenever there is a change in any attribute of a dimension. This ensures historical data integrity by retaining old values alongside new values.

SCD3(Historical + updated data with new columns)

  • SCD3 strikes a balance between simplicity and historical tracking. Here, only certain attributes of a dimension are maintained as historical, while others are updated in place like in SCD1.
  • This approach reduces storage overhead compared to SCD2 but sacrifices some level of historical granularity.

There are up to 6 types of SCDs available, but these 3 are the most used SCD Types.

By understanding the types of SCDs, and adopting the best practices, data engineers can effectively manage the evolution of dimensional data and empower data-driven decision-making processes.

I hope this article helps you understand the significance of SCD techniques in managing data and their implementation in data warehousing.

The Essential PySpark Cheat Sheet for Data Engineers.

Most Commonly Asked Apache Spark Questions in Data Engineer Interviews.

Python Coding Questions for Data Engineer Interview Part-I (Easy Level)

I hope you will Bring your hands together to create resounding claps, to show your support and encouragement for me to share even more valuable content in the future.

Follow me and subscribe to my newsletter to catch any updates from me instantly.

Thank you:)

--

--

Data Engineer - Analyst (Up-Coming Data Scientist) and Content Creator, Freelance Tutor with Affordable rate (SQL, Python, Power BI, Mathematics, and Aptitude).