Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1

Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1

I explained what SCD means in a Business Intelligence solution in my previous post. We also discussed that while we do not expect to handle SCD2 in a Power BI implementation, we can handle scenarios similar to SCD1. In this post, I explain how to do so.

Scenario

We have a retail company selling products. The company releases the list of products in Excel format, including list price and dealer price, every year. The product list is released on the first day of July when the financial year starts. We have to implement a Power BI solution that keeps the latest product data to analyse the sales transactions. The following image shows the Product list for 2013:

Products List 2013 in Excel
Products List 2013

So each year, we receive a similar Excel file to the above image. The files are stored on a SharePoint Online site.

Scenario Explained

As the previous post explains, an SCD1 always keeps the current data by updating the old data with the new data. So an ETL process reads the data from the source, identifies the existing data in the destination table, inserts the new rows to the destination, updates the existing rows, and deletes the removed rows.

Here is why our scenario is similar to SCD1, with one exception:

  • We do not actually update the data in the Excel files and do not create an ETL process to read the data from the Excel files, identify the changes and apply the changes to an intermediary Excel file
  • We must read the data from the source Excel files, keep the latest data while filtering out the old ones and load the data into the data model.

As you see, while we are taking a very different implementation approach, the results are very similar with an exception: we do not delete any rows.

Implementation

Here is what we should do to achieve the goal:

  • We get the data in Power Query Editor using the SharePoint Folder connector
  • We combite the files
  • We use the ProductNumber column to identify the duplicated products
  • We use the Reporting Date column to identify the latest dates
  • We only keep the latest rows

Getting Data from SharePoint Online Folder

As we get the data from multiple files stored on SharePoint Online, we have to use the SharePoint Folder connector. Follow these steps:

  1. Login to SharePoint Online and navigate to the site holding the Product list Excel files and copy the site URL from the browser
Getting SharePoint Online Site URL
Getting SharePoint Online Site URL
  1. From the Get Data in the Power BI Desktop, select the SharePoint Folder connector
  2. Click Connect
Connecting to SharePoint Online Folder from Power BI
Connecting to SharePoint Online Folder from Power BI
  1. Paste the Site URL copied on step 1
  2. Click OK
Connecting to SharePoint Online Folder from Power BI using the SharePoint Folder connector
Connecting to SharePoint Online Folder from Power BI using the SharePoint Folder connector
  1. Click Transform Data
Transforming data in Power Query Editor
Transforming data in Power Query Editor
Continue reading “Slowly Changing Dimension (SCD) in Power BI, Part 2, Implementing SCD 1”

Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD

Slowly changing dimension (SCD) is a data warehousing concept coined by the amazing Ralph Kimball. The SCD concept deals with moving a specific set of data from one state to another. Imagine a human resources (HR) system having an Employee table. As the following image shows, Stephen Jiang is a Sales Manager having ten sales representatives in his team:

SCD in Power BI, Stephen Jiang is the sales manager of a team of 10 sales representatives
Image 1: Stephen Jiang is the sales manager of a team of 10 sales representatives

Today, Stephen Jiang got his promotion to the Vice President of Sales role, so his team has grown in size from 10 to 17. Stephen is the same person, but his role is now changed, as shown in the following image:

SCD in Power BI, Stephen's team after he was promoted to Vice President of Sales
Image 2: Stephen’s team after he was promoted to Vice President of Sales

Another example is when a customer’s address changes in a sales system. Again, the customer is the same, but their address is now different. From a data warehousing standpoint, we have different options to deal with the data depending on the business requirements, leading us to different types of SDCs. It is crucial to note that the data changes in the transactional source systems (in our examples, the HR system or a sales system). We move and transform the data from the transactional systems via ETL (Extract, Transform, and Load) processes and land it in a data warehouse, where the SCD concept kicks in. SCD is about how changes in the source systems reflect the data in the data warehouse. These kinds of changes in the source system do not happen very often hence the term slowly changing. Many SCD types have been developed over the years, which is out of the scope of this post, but for your reference, we cover the first three types as follows.

SCD type zero (SCD 0)

With this type of SCD, we ignore all changes in a dimension. So, when a person’s residential address changes in the source system (an HR system, in our example), we do not change the landing dimension in our data warehouse. In other words, we ignore the changes within the data source. SCD 0 is also referred to as fixed dimensions.

Continue reading “Slowly Changing Dimension (SCD) in Power BI, Part 1, Introduction to SCD”