Using GitHub Actions to speed up CI/CD in data science projects

As the latest advances regarding cloud computing, it has become even more necessary for the implementation of tools that are, at the same time, scalable, and that ensure the reproducibility of the execution. Having this need in mind, a few tools arose to the job, such as Docker, that allow the creation of a “recipe” of the application, ensuring that different builds of the same application run equally.

Differently than a Virtual Machine (VM), that provides an infrastructure through Hypervisors and emulates processors and memory, Docker shares these resources throughout the containers, allowing the developer to focus less on the infrastructure and more on the development of the application. Nonetheless, the containerization of projects and applications mitigates the expression “It Runs on My Machine”, given that it tries to ensure that, independently on the platform chosen by the developer, the Docker container executes always in the same way.

Given that the benefits of containerization go beyond the development of applications and can be useful in other fields, many data scientists began to use Docker to containerize their analysis, model training, dashboards, and APIs, both to make the delivery of projects easier (given that it reduces the possibility of bugs) and to ensure that the results found once, can always be reached again.

#github-actions #data-science #github #docker

towardsdatascience.com

Using GitHub Actions to speed up CI/CD in data science projects