Inside VertiPaq in Power BI — Compress for success!

Have you ever wondered what makes Power BI so fast and powerful when it comes to performance? So powerful, that it performs complex calculations over millions of rows in a blink of an eye.

In this series of articles, we will dig deep to discover what is “under the hood” of Power BI, how your data is being stored, compressed, queried, and finally, brought back to your report. Once you finish reading, I hope that you will get a better understanding of the hard work happening in the background and appreciate the importance of creating an optimal data model in order to get maximum performance from the Power BI engine.

As you might recall, in the previous article we scratched the surface of VertiPaq, a powerful storage engine, which is “responsible” for the blazing-fast performance of most of your Power BI reports (whenever you are using Import mode or Composite model).

3, 2, 1…Fasten your seatbelts!

One of the key characteristics of the VertiPaq is that it’s a columnar database. We learned that columnar databases store data optimized for vertical scanning, which means that every column has its own structure and is physically separated from other columns.

Image for post

That fact enables VertiPaq to apply different types of compression to each of the columns independently, choosing the optimal compression algorithm based on the values in that specific column.

Compression is being achieved by encoding the values within the column. But, before we dive deeper into a detailed overview of encoding techniques, just keep in mind that this architecture is not exclusively related to Power BI — in the background is a Tabular model, which is also “under the hood” of SSAS Tabular and Excel Power Pivot.

Value Encoding

This is the most desirable value encoding type since it works exclusively with integers and, therefore, require less memory than, for example, when working with text values.

How does this look in reality? Let’s say we have a column containing a number of phone calls per day, and the value in this column varies from 4.000 to 5.000. What the VertiPaq would do, is to find the minimum value in this range (which is 4.000) as a starting point, then calculate the difference between this value and all the other values in the column, storing this difference as a new value.

#power-bi #data-science #towards-data-science #data #data-modeling #data analytic

3, 2, 1…Fasten your seatbelts!

Value Encoding

towardsdatascience.com

Inside VertiPaq in Power BI — Compress for success!