Today, I’m happy to introduce Data Science Pull Requests (DS PRs) on DAGsHub, which are Pull Requests (PRs), re-imagined for the data science (DS) workflow. This new capability unlocks a standard review process for data science teams, enabling them to merge data across different branches and accept data contributions across forks. This provides a better collaborative experience for teams in data science organizations and enables truly Open Source Data Science (OSDS) projects.

For more details, read on…


When we started DAGsHub, we were focused on making data science collaboration possible. Specifically, we deeply care and rely on Open Source Software (OSS), and we set out on a mission to make OSDS as accessible and prevalent as OSS is today.

This meant that we were concerned about the **_discoverability _**of data science projects and experiments to work on, understandability of the context of an experiment, **reproducibility of **its results, and finally, contributability of code-, data- and models- changed back to the original project.

When reviewing these processes and the existing solutions some things become clear:

  • **_Discoverability _**means being able to answer the question “What should I do next?” — finding a project to work on, and within that project finding what experiments might be interesting or important. It is solved mainly by experiment tracking systems, many of them using proprietary or black box formats that are hard to understand and migrate to/from.DAGsHub goes beyond this by creating an experiment tracking system that relies on simple open formats (YAML and CSV). This means you don’t need to add obscure lines of code – everything works by automatically scanning and analyzing the git commits pushed into the platform.

#machine-learning #data-science #data-science-workflow #mlops #towards-data-science

Data Science Pull Requests — A Method for Data Science Review & Merging
1.45 GEEK