7 Rules for Bulletproof, Reproducible Machine Learning R&D

So, if you’re a nose-to-the-keyboard developer, there’s ample probability that this analogy is outside your comfort zone … bear with me.

Imagine two Olympics-level figure skaters working together on the ice, day in and day out, to develop and perfect a medal-winning performance. Each has his or her role, and they work in sync to merge their actions and fine-tune the results. Each tiny change affects the other’s movements — hopefully, to improve their dance, but often to ruin it. Over time, they develop an ongoing communication channel to make sure that each knows what the other is doing for a consistent, always-improved result.

Machine learning represents a curiously similar dynamic, in which your models and code join the training data to work in tandem and produce the intended results. The path to optimization is — like that of the ice skaters — driven by small adjustments that need to be systematically tried and retried (and retried and retried), carefully and intentionally. But every change, every adjustment, and every new angle of attack also opens the door to error, confusion, and inconsistent inferences. In short, a lack of disciplined structure and planning leads to a deficit in reproducibility, quickly curtailing ML development.

Even if the metaphor above is too cliché for you, dealing with ML in any business use-case should have you nodding.

