Can We Apply the DRY Principle to Business Intelligence? Have you ever encountered the same queries being applied over and over again in various dashboards? Most software engineers are taught from day one the “DRY” principle: Don’t Repeat Yourself. How can we apply software engineering principle to mitigate the problem of duplication?
“Don’t Repeat Yourself” is beneficial— not only in software engineering Have you ever encountered the same queries being applied over and over again in various dashboards? Or the same KPIs being calculated in nearly every single report? If your answer is yes, you are not alone. It’s common among business users to simply copy-paste the same queries, data definitions, and KPI calculations. But there is a better way.
The dangers of knowledge duplication Most software engineers are taught from day one the “DRY” principle: Don’t Repeat Yourself. This principle states that:
“Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”. — The Pragmatic Programmer Even though most data and software engineers know how to write DRY code, many data analysts and business users are unaware of the dangers of copy-pasting their queries and data definitions. As a result, we often end up in Business Intelligence reports which have different representations of KPIs and business logic. To make the argument more concrete, here are some of the consequences of code duplication in analytical workloads:
If something in your code needs to change, you need to make those changes in X different places. You risk conflicting definitions if one “copy” gets modified while another one does not. In the end, nobody knows what is the single source of truth because any time you use the same KPI, it’s calculated or defined differently. If you add documentation to a duplicated code, query, code, or data definition in one place, this documentation is lacking in another place leading to further inconsistencies. I remember when my team had a discussion about what is the best place for a data dictionary. One of the suggestions was to use Confluence (or a shared spreadsheet) for that purpose. What’s wrong with this suggestion? It violates the DRY principle as it would inherently cause duplication: one definition in the database and one in the external documentation. Manually created data dictionaries never stay up to date — don’t believe anyone who claims otherwise.
Your Data Architecture: Simple Best Practices for Your Data Strategy. Don't miss this helpful article.
For Big Data Analytics, the challenges faced by businesses are unique and so will be the solution required to help access the full potential of Big Data.
In this post, we'll learn Getting Started With Data Lakes.<br><br> This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that's designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You'll also explore key benefits and common use cases.
🔵 Intellipaat Data Analytics Training: https://intellipaat.com/data-analytics-master-training-course/In this Data Analytics for beginners video you will le...
This post explains what a data connector is and provides a framework for building connectors that replicate data from different sources into your data warehouse