In this article series, we walk through how you can create your own data observability monitors from scratch, mapping to five key pillars of data health. Part 1 of this series was adapted from Barr Moses and Ryan Kearns’ O’Reilly training, Managing Data Downtime: Applying Observability to Your Data Pipelines, the industry’s first-ever course on data observability. The associated exercises are available here, and the adapted code shown in this article is available here.
From null values and duplicate rows, to modeling errors and schema changes, data can break for many reasons. Data testing is often our first line of defense against bad data, but what happens if data breaks during its life cycle?
We call this phenomenon data downtime, and it refers to periods of time where data is missing, erroneous, or otherwise inaccurate. Data downtime prompts us to ask questions such as:
#2021 feb tutorials # overviews #data engineering #data quality #data science #data science platform #sql