10 Data Engineering Practices to Ensure Data and Code Quality

Data engineering is one of the fastest-growing professions of the century. Since I started working in the field, I encountered various ways of ensuring data and code quality across organizations. Even though each company may follow different processes and standards, there are some universal principles that can help us to enhance the development speed, improve code maintenance, and make work with data easier.

1. Functional programming

The first programming language I’ve learned during my studies was Java. Even though I understood the benefits of object-oriented programming related to creating reusable classes and modules, I found it hard to apply it when working with data. Two years later, I came across R — a functional programming language, and back then, I fell in love. Being able to use the dplyr package and simply pipe the functions to transform the data and quickly see the results, was life-changing.

But these days, Python allows us to combine both worlds: the ability to write object-oriented modular scripts, while at the same time making use of functional programming that works so well when interacting with data in R.

The reason why functional programming is so excellent for working with data is that nearly any data engineering task can be accomplished by taking input data, applying some function to it (i.e., your T in ETL: transforming, cleaning, or enriching the data), and loading its output to some centralized repository or serving it for reporting or data science use cases.

The functional programming paradigm is so common in data engineering that many blog posts have been written about that.

#data-engineering #software-engineering #programming #data-science #python

1. Functional programming

towardsdatascience.com

10 Data Engineering Practices to Ensure Data and Code Quality