How DAGs Grow: When People Trust A Data Source, They'll Ask More Of It

This blog post is a refresh of a talk that James and I gave at Strata back in 2017. Why recap a 3-year-old conference talk? Well, the core ideas have aged well, we’ve never actually put them into writing before, and we’ve learned some new things in the meantime. Enjoy!

There are three phases. First, when you first start building a data system, there’s an up-front cost to figure out how it all should work, fit it together and build it.

Second, for a while it just works. We call this phase the “miracle of software.”

Third, after a while, you start noticing that you’re spending a lot of time on maintenance: bug fixes and root cause analysis. Trying to figure out why things are breaking in your data pipelines. Making additions and adjustments as other systems appear or change.

Over time, this turns into a steady, compounding creep in the maintenance time that you’re putting into the project. If that gets too far out of control, the work just stops being fun. Stuff breaks unpredictably, timelines become highly variable, and people burn out.

We wanted to articulate why this happens, to understand how to prevent it. In the end, we arrived at this mental model of the core dynamics.

