With Pachyderm and Actions, the MLOps process can be automated. Engineers and data scientists can write their machine learning code, and it seamlessly and automatically gets deployed to a production-scale data workflow.
This is a guest post by Jimmy Whitaker, senior data science evangelist at Pachyderm
In the last few years, DevOps has begun to shift—from a culture of continuous integration and continuous deployment (CI/CD) best practices to leveraging Git-based techniques to manage software deployments. This transition has made software less error prone, more scalable, and has increased collaboration by making DevOps more developer-centric.
At its heart, this simplification lets developers use the same powerful version control system they’re used to with Git as a way to deliver infrastructure as code, documentation, and even Kubernetes configurations. It works by using Git as a single source of truth, letting you build robust declarative apps and infrastructure with ease. But when it comes to machine learning applications, DevOps gets much more complex.
The kinds of problems we face in machine learning are fundamentally different than the ones we face in traditional software coding. Functional issues, like race conditions, infinite loops, and buffer overflows, don’t come into play with machine learning models. Instead, errors come from edge cases, lack of data coverage, adversarial assault on the logic of a model, or overfitting. Edge cases are the reason so many organizations are racing to build AI Red Teams to diagnose problems before things go horribly wrong.
It’s simply not enough to port your CI/CD and infrastructure code to machine learning workflows and call it done. Handling this new generation of machine learning operations (MLOps) problems requires a brand new set of tools that focus on the gap between code-focused operations and MLOps. The key difference is data. We need to version our data and datasets in tandem with the code. That means we need tools that specifically focus on data versioning, model training, production monitoring, and many others unique to the challenges of machine learning at scale.
Luckily, we have a strong tool for MLOps that does seamless data version control: Pachyderm. When we link the power of Pachyderm to our declarative Git-based workflows, we can bring the smooth flow of CI/CD to the world of machine learning models.
We’ll use two key tools to combine our existing CI/CD tooling and MLOps:
To set up a serverless CI/CD pipeline in your AWS environments, there are several key services that you need to use. Find out more here.
A common challenge that cloud native application developers face is manually testing against inconsistent environments. GitHub Actions can be triggered based on nearly any GitHub event making it possible to build in accountability for updating tests and fixing bugs.
This article gives direction to getting your CI/CD pipeline up and running on the Kubernetes cluster by the GitLab CICD pipeline.
What is DevOps? How are organizations transitioning to DevOps? Is it possible for organizations to shift to enterprise DevOps? Read more to find out!
The ultimate showdown between Travis CI vs Jenkins. Check out this guide to know who wins the race! Travis CI and Jenkins are both popular CI/CD tools and were launched in the same year i.e. 2011. As of July 2020, Jenkins has been the more obvious choice as CI/CD tool with 15.9k stars & 6.3k forks, in comparison to TravisCI which has 8k stars & 756 forks. However, these numbers alone don’t imply which CI/CD tool is more suitable for your upcoming or existing project. Jenkins is an open-source & Travis CI is free for open-source projects.