How DAGs Grow: When People Trust A Data Source, They'll Ask More Of It

How DAGs Grow: When People Trust A Data Source, They'll Ask More Of It

This blog post is a refresh of a talk that James and I gave at Strata back in 2017. Why recap a 3-year-old conference talk? Well, the core ideas have aged well, we’ve never actually put them into writing before, and we’ve learned some new things in the meantime. Enjoy!

There are three phases. First, when you first start building a data system, there’s an up-front cost to figure out how it all should work, fit it together and build it.

Second, for a while it just works. We call this phase the “miracle of software.”

Third, after a while, you start noticing that you’re spending a lot of time on maintenance: bug fixes and root cause analysis. Trying to figure out why things are breaking in your data pipelines. Making additions and adjustments as other systems appear or change.

Over time, this turns into a steady, compounding creep in the maintenance time that you’re putting into the project. If that gets too far out of control, the work just stops being fun. Stuff breaks unpredictably, timelines become highly variable, and people burn out.

We wanted to articulate why this happens, to understand how to prevent it. In the end, we arrived at this mental model of the core dynamics.

data data-science data-engineering data-validation data-quality devops hackernoon-top-story what-is-dag

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Science Course Online - Top Data Scientist Training Program

Data Science courses online taught by best data scientists & IBM experts will masters you in data analytics. Get IBM Data Scientist certification course training!

What Are The Advantages and Disadvantages of Data Science?

Online Data Science Training in Noida at CETPA, best institute in India for Data Science Online Course and Certification. Call now at 9911417779 to avail 50% discount.

Why Data Quality is Key to Successful ML Ops

In this post, we are going to look at ML Ops, a recent development in ML that bridges the gap between ML and traditional software engineering, and highlight how data quality is key to ML Ops workflows in order to accelerate data teams and maintain trust in your data.

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

'Commoditization Is The Biggest Problem In Data Science Education'

The biggest problem we face today is the commoditization of education. Individuals and corporations alike would like quality courses to be offered by the best faculty at the lowest price