Let’s paint a scenario, you’re working on a Data Science project and at first, you had a model accuracy of 80%, you deploy that application to production serving it as an API using Flask. Then some few days you decide to pick up the project later on, after tuning some of the parameters and adding some more data, you had better accuracy than the previous model built. Now you plan to deploy this model and you have to go through the trouble of building, testing and deploying the model to production again which is a lot of work. In this article, I will show you how we can use a powerful tool called** Jenkins **to automate this process.

Image for post

Photo by Yancy Min on Unsplash

What is Jenkins?

Jenkins is a free and open-source automation server. It helps automate the parts of software development related to buildingtesting, and deploying, facilitating continuous integration and continuous delivery — Wikipedia

With Jenkins, you can automate and accelerate software delivery processes throughout the entire lifecycle using a vast majority of plugins. For example, you can set up Jenkins to automatically detect code commit in a repository and automatically trigger commands either building a Docker image from a Dockerfile, running unit tests, push an image to a container registry or deploy it to the production server without manually doing anything. I’ll be explaining some basic concept we need to know in order to perform some automation in our Data Science project.

Benefits of Jenkins

  1. It is Open Source
  2. Easy to use and install
  3. A large number of plugins that fit into a DevOps environment
  4. Spend more time on your code and less time on deployment
  5. Massive community

Jenkins Installation

Jenkins support installation across cross platforms whether if you’re a Windows, Linux or Mac user. You can even install it on a cloud server that supports either PowerShell or Linux instances. To install Jenkins, you can refer to the documentation here.

Jenkins has a lot of amazing features and some are beyond the scope of this article, to get the hang of Jenkins you can check the documentation.

Before we jump into the practical side of things, there are some terms I want to explain which is very important, some of which are:

Jenkins Job

A Jenkins job simply refers to runnable tasks that are controlled by Jenkins. For instance, you can assign a job to Jenkins to perform some certain operations like run “Hello World”, perform unit and integration testing etc. Creating Job is very easy in Jenkins but in a software environment, you may not build a single job but instead, you’ll be doing what is referred to as a pipeline.

Jenkins Pipeline

A pipeline is running a collection of jobs following a particular order or sequence, let me explain this with an example. Suppose I am developing an application on Jenkins and I want to pull the code from a code repository, build the application, test and deploy it to a server. To do this, I will create four jobs to perform each of those processes. So, the first job(Job 1) will pull the code from the repository, the second job(Job 2) would be for building the application, third job(Job 3) would perform unit and integration tests and the fourth job(Job 4) for deploying the code to production. I can use the Jenkins build pipeline plugin to perform this task. After creating the jobs and chaining them in a sequence, the build plugin will then run each of these jobs as a pipeline.

#devops #machine-learning #jenkins

Automating Data Science Projects with Jenkins
2.80 GEEK