This is a guest post by Jimmy Whitaker, senior data science evangelist at Pachyderm
In the last few years, DevOps has begun to shift—from a culture of continuous integration and continuous deployment (CI/CD) best practices to leveraging Git-based techniques to manage software deployments. This transition has made software less error prone, more scalable, and has increased collaboration by making DevOps more developer-centric.
At its heart, this simplification lets developers use the same powerful version control system they’re used to with Git as a way to deliver infrastructure as code, documentation, and even Kubernetes configurations. It works by using Git as a single source of truth, letting you build robust declarative apps and infrastructure with ease. But when it comes to machine learning applications, DevOps gets much more complex.
The kinds of problems we face in machine learning are fundamentally different than the ones we face in traditional software coding. Functional issues, like race conditions, infinite loops, and buffer overflows, don’t come into play with machine learning models. Instead, errors come from edge cases, lack of data coverage, adversarial assault on the logic of a model, or overfitting. Edge cases are the reason so many organizations are racing to build AI Red Teams to diagnose problems before things go horribly wrong.
It’s simply not enough to port your CI/CD and infrastructure code to machine learning workflows and call it done. Handling this new generation of machine learning operations (MLOps) problems requires a brand new set of tools that focus on the gap between code-focused operations and MLOps. The key difference is data. We need to version our data and datasets in tandem with the code. That means we need tools that specifically focus on data versioning, model training, production monitoring, and many others unique to the challenges of machine learning at scale.
Luckily, we have a strong tool for MLOps that does seamless data version control: Pachyderm. When we link the power of Pachyderm to our declarative Git-based workflows, we can bring the smooth flow of CI/CD to the world of machine learning models.
We’ll use two key tools to combine our existing CI/CD tooling and MLOps:
#enterprise #partners #actions #ci/cd #devops #git-based #mlops #pachyderm #workflow