MLOps meets DevOps - Pachyderm: Enables DevOps for data

MLOps meets DevOps - Pachyderm: Enables DevOps for data

With Pachyderm and Actions, the MLOps process can be automated. Engineers and data scientists can write their machine learning code, and it seamlessly and automatically gets deployed to a production-scale data workflow.

This is a guest post by Jimmy Whitaker, senior data science evangelist at Pachyderm

In the last few years, DevOps has begun to shift—from a culture of continuous integration and continuous deployment (CI/CD) best practices to leveraging Git-based techniques to manage software deployments. This transition has made software less error prone, more scalable, and has increased collaboration by making DevOps more developer-centric.

At its heart, this simplification lets developers use the same powerful version control system they’re used to with Git as a way to deliver infrastructure as code, documentation, and even Kubernetes configurations. It works by using Git as a single source of truth, letting you build robust declarative apps and infrastructure with ease. But when it comes to machine learning applications, DevOps gets much more complex.

The difference between MLOps and DevOps

The kinds of problems we face in machine learning are fundamentally different than the ones we face in traditional software coding. Functional issues, like race conditions, infinite loops, and buffer overflows, don’t come into play with machine learning models. Instead, errors come from edge cases, lack of data coverage, adversarial assault on the logic of a model, or overfitting. Edge cases are the reason so many organizations are racing to build AI Red Teams to diagnose problems before things go horribly wrong.

It’s simply not enough to port your CI/CD and infrastructure code to machine learning workflows and call it done. Handling this new generation of machine learning operations (MLOps) problems requires a brand new set of tools that focus on the gap between code-focused operations and MLOps. The key difference is data. We need to version our data and datasets in tandem with the code. That means we need tools that specifically focus on data versioning, model training, production monitoring, and many others unique to the challenges of machine learning at scale.

Pachyderm: Enabling DevOps for data

Luckily, we have a strong tool for MLOps that does seamless data version control: Pachyderm. When we link the power of Pachyderm to our declarative Git-based workflows, we can bring the smooth flow of CI/CD to the world of machine learning models.

We’ll use two key tools to combine our existing CI/CD tooling and MLOps:

  • GitHub Actions builds powerful, flexible automation directly into the developer workflow. Actions can be triggered by nearly any GitHub event, such as pull requests, releases, push events, etc. While the triggered workflow can be almost anything, in the context of this post, it will most commonly be a build, CI/CD step, or release.
  • Pachyderm delivers data versioning and lineage to our data science project. It provides Git-like functionality, baked right into a version-controlled file system. This gives you the ability to control your binary assets, both big and small, and to create processing pipelines triggered by data changes.

enterprise partners actions ci/cd devops git-based mlops pachyderm workflow

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Serverless CI/CD on the AWS Cloud

To set up a serverless CI/CD pipeline in your AWS environments, there are several key services that you need to use. Find out more here.

GitHub Demo Days - Using GitHub Actions for testing cloud native applications

A common challenge that cloud native application developers face is manually testing against inconsistent environments. GitHub Actions can be triggered based on nearly any GitHub event making it possible to build in accountability for updating tests and fixing bugs.

How To Setup a CI/CD Pipeline With Kubernetes 2020 - DZone DevOps

This article gives direction to getting your CI/CD pipeline up and running on the Kubernetes cluster by the GitLab CICD pipeline.

What Is DevOps and Is Enterprise DevOps Any Good?

What is DevOps? How are organizations transitioning to DevOps? Is it possible for organizations to shift to enterprise DevOps? Read more to find out!

Travis CI vs Jenkins: Which CI/CD Tool Is Right For You?

The ultimate showdown between Travis CI vs Jenkins. Check out this guide to know who wins the race! Travis CI and Jenkins are both popular CI/CD tools and were launched in the same year i.e. 2011. As of July 2020, Jenkins has been the more obvious choice as CI/CD tool with 15.9k stars & 6.3k forks, in comparison to TravisCI which has 8k stars & 756 forks. However, these numbers alone don’t imply which CI/CD tool is more suitable for your upcoming or existing project. Jenkins is an open-source & Travis CI is free for open-source projects.