Nowadays, learning from data to gain business insights is common for almost every industry. These insights include— predictability, customer churn behavior, forecasting, etc… Machine learning is the key player in generating these insights.

Building a good ML model requires a lot of experiments that involves multiple iterations of different algorithms over data, creating new variables, adding more data etc… As the number of iterations grows, it becomes harder to keep track of these experiments.

In this article I will talk about a system to effectively version control machine learning project. I will also share some tools that will help you in easily implementing this system.

In what scenarios you get a new version?

1. Data changes

Whenever there is a change in the modeling data, you create a new version of the model. ML models are trained on the modeling data. As the modeling data changes, model parameters will also change. You change the data when you do the following:

  • When you capture more data.
  • When you normalize/standardized data.
  • When you create new variables — combination of variables, dummy variables, trends, etc…
  • When you drop existing variables.
  • When you fill missing data or drop missing data.
  • Or any other way, in which some change happen in the data.

#dvc #data-science #machine-learning #version-control #git

Version Control Machine Learning Experiments
1.05 GEEK