Hunter  Krajcik

Hunter Krajcik

1627128660

The Road to a Serverless ML Pipeline in Production — Part I

How Nutrino designed a serverless MLOps stack in production

Bringing ML models to production today is complicated — different companies have different requirements from the ML stack and there are many tools out there, each tool tries to solve a different aspect of the ML lifecycle. These tools are still a work in progress and there’s no one “clear cut” solution for MLOps. In this article, I’d like to share the process we went through in creating our own MLOps stack, including the way our team worked before the process started, the research we did on different MLOps tools, and how we decided on the solution that fit our non-standard models.

For the detailed solution, including more technical explanations, check out  part II of this article.

TL;DR - We managed to run different types of models (our own Python models) with multiple production versions for each of them — all in a serverless environment!

Our ML stack pre-refactoring

This was our ML stack before we began the process of refactoring:

Image by Author

Research Environment

We were using a datalake environment which had an ETL process that transferred production data to parquet files to S3 in that environment. The data scientists were doing their research using Zeppelin notebooks running on EMR clusters in that environment (thus utilizing the distributed abilities of Spark).

Feature Extraction

Feature extraction was done using AWS lambdas that were triggered by a Kinesis stream every time new data arrived to our centralized data store, and was deployed using the Serverless Framework.

Training

  1. Once a research for a certain model was completed, the data scientist created a training notebook for that model (in the same datalake environment).
  2. We ran the training’s notebook periodically using Apache Airflow (and leveraging its strength in running scheduled jobs). To do so, we created a DAG (Directed Acyclic Graph) for each of the models’ training notebooks.
  3. Airflow’s DAGs were deployed instantly every push to the master branch.
  4. The DAGs created an EMR cluster which ran the training notebooks. The notebooks were connected to a GitHub repository, so that every commit in a notebook was basically an automated “deployment” of the training code.

#mlops #serverless #data-science

What is GEEK

Buddha Community

The Road to a Serverless ML Pipeline in Production — Part I
Hermann  Frami

Hermann Frami

1655426640

Serverless Plugin for Microservice Code Management and Deployment

Serverless M

Serverless M (or Serverless Modular) is a plugin for the serverless framework. This plugins helps you in managing multiple serverless projects with a single serverless.yml file. This plugin gives you a super charged CLI options that you can use to create new features, build them in a single file and deploy them all in parallel

splash.gif

Currently this plugin is tested for the below stack only

  • AWS
  • NodeJS λ
  • Rest API (You can use other events as well)

Prerequisites

Make sure you have the serverless CLI installed

# Install serverless globally
$ npm install serverless -g

Getting Started

To start the serverless modular project locally you can either start with es5 or es6 templates or add it as a plugin

ES6 Template install

# Step 1. Download the template
$ sls create --template-url https://github.com/aa2kb/serverless-modular/tree/master/template/modular-es6 --path myModularService

# Step 2. Change directory
$ cd myModularService

# Step 3. Create a package.json file
$ npm init

# Step 3. Install dependencies
$ npm i serverless-modular serverless-webpack webpack --save-dev

ES5 Template install

# Step 1. Download the template
$ sls create --template-url https://github.com/aa2kb/serverless-modular/tree/master/template/modular-es5 --path myModularService

# Step 2. Change directory
$ cd myModularService

# Step 3. Create a package.json file
$ npm init

# Step 3. Install dependencies
$ npm i serverless-modular --save-dev

If you dont want to use the templates above you can just add in your existing project

Adding it as plugin

plugins:
  - serverless-modular

Now you are all done to start building your serverless modular functions

API Reference

The serverless CLI can be accessed by

# Serverless Modular CLI
$ serverless modular

# shorthand
$ sls m

Serverless Modular CLI is based on 4 main commands

  • sls m init
  • sls m feature
  • sls m function
  • sls m build
  • sls m deploy

init command

sls m init

The serverless init command helps in creating a basic .gitignore that is useful for serverless modular.

The basic .gitignore for serverless modular looks like this

#node_modules
node_modules

#sm main functions
sm.functions.yml

#serverless file generated by build
src/**/serverless.yml

#main serverless directories generated for sls deploy
.serverless

#feature serverless directories generated sls deploy
src/**/.serverless

#serverless logs file generated for main sls deploy
.sm.log

#serverless logs file generated for feature sls deploy
src/**/.sm.log

#Webpack config copied in each feature
src/**/webpack.config.js

feature command

The feature command helps in building new features for your project

options (feature Command)

This command comes with three options

--name: Specify the name you want for your feature

--remove: set value to true if you want to remove the feature

--basePath: Specify the basepath you want for your feature, this base path should be unique for all features. helps in running offline with offline plugin and for API Gateway

optionsshortcutrequiredvaluesdefault value
--name-nstringN/A
--remove-rtrue, falsefalse
--basePath-pstringsame as name

Examples (feature Command)

Creating a basic feature

# Creating a jedi feature
$ sls m feature -n jedi

Creating a feature with different base path

# A feature with different base path
$ sls m feature -n jedi -p tatooine

Deleting a feature

# Anakin is going to delete the jedi feature
$ sls m feature -n jedi -r true

function command

The function command helps in adding new function to a feature

options (function Command)

This command comes with four options

--name: Specify the name you want for your function

--feature: Specify the name of the existing feature

--path: Specify the path for HTTP endpoint helps in running offline with offline plugin and for API Gateway

--method: Specify the path for HTTP method helps in running offline with offline plugin and for API Gateway

optionsshortcutrequiredvaluesdefault value
--name-nstringN/A
--feature-fstringN/A
--path-pstringsame as name
--method-mstring'GET'

Examples (function Command)

Creating a basic function

# Creating a cloak function for jedi feature
$ sls m function -n cloak -f jedi

Creating a basic function with different path and method

# Creating a cloak function for jedi feature with custom path and HTTP method
$ sls m function -n cloak -f jedi -p powers -m POST

build command

The build command helps in building the project for local or global scope

options (build Command)

This command comes with four options

--scope: Specify the scope of the build, use this with "--feature" tag

--feature: Specify the name of the existing feature you want to build

optionsshortcutrequiredvaluesdefault value
--scope-sstringlocal
--feature-fstringN/A

Saving build Config in serverless.yml

You can also save config in serverless.yml file

custom:
  smConfig:
    build:
      scope: local

Examples (build Command)

all feature build (local scope)

# Building all local features
$ sls m build

Single feature build (local scope)

# Building a single feature
$ sls m build -f jedi -s local

All features build global scope

# Building all features with global scope
$ sls m build -s global

deploy command

The deploy command helps in deploying serverless projects to AWS (it uses sls deploy command)

options (deploy Command)

This command comes with four options

--sm-parallel: Specify if you want to deploy parallel (will only run in parallel when doing multiple deployments)

--sm-scope: Specify if you want to deploy local features or global

--sm-features: Specify the local features you want to deploy (comma separated if multiple)

optionsshortcutrequiredvaluesdefault value
--sm-paralleltrue, falsetrue
--sm-scopelocal, globallocal
--sm-featuresstringN/A
--sm-ignore-buildstringfalse

Saving deploy Config in serverless.yml

You can also save config in serverless.yml file

custom:
  smConfig:
    deploy:
      scope: local
      parallel: true
      ignoreBuild: true

Examples (deploy Command)

Deploy all features locally

# deploy all local features
$ sls m deploy

Deploy all features globally

# deploy all global features
$ sls m deploy --sm-scope global

Deploy single feature

# deploy all global features
$ sls m deploy --sm-features jedi

Deploy Multiple features

# deploy all global features
$ sls m deploy --sm-features jedi,sith,dark_side

Deploy Multiple features in sequence

# deploy all global features
$ sls m deploy  --sm-features jedi,sith,dark_side --sm-parallel false

Author: aa2kb
Source Code: https://github.com/aa2kb/serverless-modular 
License: MIT license

#serverless #aws #node #lambda 

studio52 dubai

studio52 dubai

1621769539

How to find the best video production company in Dubai?

How to find the best video production company in Dubai?We are the best video production company in Dubai, UAE. We offer Corporate Video, event video, animation video, safety video and timelapse video in most engaging and creative ways.

#video production company #video production dubai #video production services #video production services dubai #video production #video production house

Hunter  Krajcik

Hunter Krajcik

1627128660

The Road to a Serverless ML Pipeline in Production — Part I

How Nutrino designed a serverless MLOps stack in production

Bringing ML models to production today is complicated — different companies have different requirements from the ML stack and there are many tools out there, each tool tries to solve a different aspect of the ML lifecycle. These tools are still a work in progress and there’s no one “clear cut” solution for MLOps. In this article, I’d like to share the process we went through in creating our own MLOps stack, including the way our team worked before the process started, the research we did on different MLOps tools, and how we decided on the solution that fit our non-standard models.

For the detailed solution, including more technical explanations, check out  part II of this article.

TL;DR - We managed to run different types of models (our own Python models) with multiple production versions for each of them — all in a serverless environment!

Our ML stack pre-refactoring

This was our ML stack before we began the process of refactoring:

Image by Author

Research Environment

We were using a datalake environment which had an ETL process that transferred production data to parquet files to S3 in that environment. The data scientists were doing their research using Zeppelin notebooks running on EMR clusters in that environment (thus utilizing the distributed abilities of Spark).

Feature Extraction

Feature extraction was done using AWS lambdas that were triggered by a Kinesis stream every time new data arrived to our centralized data store, and was deployed using the Serverless Framework.

Training

  1. Once a research for a certain model was completed, the data scientist created a training notebook for that model (in the same datalake environment).
  2. We ran the training’s notebook periodically using Apache Airflow (and leveraging its strength in running scheduled jobs). To do so, we created a DAG (Directed Acyclic Graph) for each of the models’ training notebooks.
  3. Airflow’s DAGs were deployed instantly every push to the master branch.
  4. The DAGs created an EMR cluster which ran the training notebooks. The notebooks were connected to a GitHub repository, so that every commit in a notebook was basically an automated “deployment” of the training code.

#mlops #serverless #data-science

Serverless Applications - Pros and Cons to Help Businesses Decide - Prismetric

In the past few years, especially after Amazon Web Services (AWS) introduced its Lambda platform, serverless architecture became the business realm’s buzzword. The increasing popularity of serverless applications saw market leaders like Netflix, Airbnb, Nike, etc., adopting the serverless architecture to handle their backend functions better. Moreover, serverless architecture’s market size is expected to reach a whopping $9.17 billion by the year 2023.

Global_Serverless_Architecture_Market_2019-2023

Why use serverless computing?
As a business it is best to approach a professional mobile app development company to build apps that are deployed on various servers; nevertheless, businesses should understand that the benefits of the serverless applications lie in the possibility it promises ideal business implementations and not in the hype created by cloud vendors. With the serverless architecture, the developers can easily code arbitrary codes on-demand without worrying about the underlying hardware.

But as is the case with all game-changing trends, many businesses opt for serverless applications just for the sake of being up-to-date with their peers without thinking about the actual need of their business.

The serverless applications work well with stateless use cases, the cases which execute cleanly and give the next operation in a sequence. On the other hand, the serverless architecture is not fit for predictable applications where there is a lot of reading and writing in the backend system.

Another benefit of working with the serverless software architecture is that the third-party service provider will charge based on the total number of requests. As the number of requests increases, the charge is bound to increase, but then it will cost significantly less than a dedicated IT infrastructure.

Defining serverless software architecture
In serverless software architecture, the application logic is implemented in an environment where operating systems, servers, or virtual machines are not visible. Although where the application logic is executed is running on any operating system which uses physical servers. But the difference here is that managing the infrastructure is the soul of the service provider and the mobile app developer focuses only on writing the codes.

There are two different approaches when it comes to serverless applications. They are

Backend as a service (BaaS)
Function as a service (FaaS)

  1. Backend as a service (BaaS)
    The basic required functionality of the growing number of third party services is to provide server-side logic and maintain their internal state. This requirement has led to applications that do not have server-side logic or any application-specific logic. Thus they depend on third-party services for everything.

Moreover, other examples of third-party services are Autho, AWS Cognito (authentication as a service), Amazon Kinesis, Keen IO (analytics as a service), and many more.

  1. Function as a Service (FaaS)
    FaaS is the modern alternative to traditional architecture when the application still requires server-side logic. With Function as a Service, the developer can focus on implementing stateless functions triggered by events and can communicate efficiently with the external world.

FaaS serverless architecture is majorly used with microservices architecture as it renders everything to the organization. AWS Lambda, Google Cloud functions, etc., are some of the examples of FaaS implementation.

Pros of Serverless applications
There are specific ways in which serverless applications can redefine the way business is done in the modern age and has some distinct advantages over the traditional could platforms. Here are a few –

🔹 Highly Scalable
The flexible nature of the serverless architecture makes it ideal for scaling the applications. The serverless application’s benefit is that it allows the vendor to run each of the functions in separate containers, allowing optimizing them automatically and effectively. Moreover, unlike in the traditional cloud, one doesn’t need to purchase a certain number of resources in serverless applications and can be as flexible as possible.

🔹 Cost-Effective
As the organizations don’t need to spend hundreds and thousands of dollars on hardware, they don’t need to pay anything to the engineers to maintain the hardware. The serverless application’s pricing model is execution based as the organization is charged according to the executions they have made.

The company that uses the serverless applications is allotted a specific amount of time, and the pricing of the execution depends on the memory required. Different types of costs like presence detection, access authorization, image processing, etc., associated with a physical or virtual server is completely eliminated with the serverless applications.

🔹 Focuses on user experience
As the companies don’t always think about maintaining the servers, it allows them to focus on more productive things like developing and improving customer service features. A recent survey says that about 56% of the users are either using or planning to use the serverless applications in the coming six months.

Moreover, as the companies would save money with serverless apps as they don’t have to maintain any hardware system, it can be then utilized to enhance the level of customer service and features of the apps.

🔹 Ease of migration
It is easy to get started with serverless applications by porting individual features and operate them as on-demand events. For example, in a CMS, a video plugin requires transcoding video for different formats and bitrates. If the organization wished to do this with a WordPress server, it might not be a good fit as it would require resources dedicated to serving pages rather than encoding the video.

Moreover, the benefits of serverless applications can be used optimally to handle metadata encoding and creation. Similarly, serverless apps can be used in other plugins that are often prone to critical vulnerabilities.

Cons of serverless applications
Despite having some clear benefits, serverless applications are not specific for every single use case. We have listed the top things that an organization should keep in mind while opting for serverless applications.

🔹 Complete dependence on third-party vendor
In the realm of serverless applications, the third-party vendor is the king, and the organizations have no options but to play according to their rules. For example, if an application is set in Lambda, it is not easy to port it into Azure. The same is the case for coding languages. In present times, only Python developers and Node.js developers have the luxury to choose between existing serverless options.

Therefore, if you are planning to consider serverless applications for your next project, make sure that your vendor has everything needed to complete the project.

🔹 Challenges in debugging with traditional tools
It isn’t easy to perform debugging, especially for large enterprise applications that include various individual functions. Serverless applications use traditional tools and thus provide no option to attach a debugger in the public cloud. The organization can either do the debugging process locally or use logging for the same purpose. In addition to this, the DevOps tools in the serverless application do not support the idea of quickly deploying small bits of codes into running applications.

#serverless-application #serverless #serverless-computing #serverless-architeture #serverless-application-prosand-cons

Hunter  Krajcik

Hunter Krajcik

1627136040

The Road to a Serverless ML Pipeline in Production — Part II

The MLOps architecture we designed and how it’s implemented

In  part 1, I explained the process we went through in creating a fully-automated MLOps architecture, and our decision-making process, considering our needs and the stack we were already using.

In this post, I’ll show you the solution we ended up, including code examples of how to implement this solution yourself.

Let’s look at the architecture again…

Nutrino’s MLOps architecture (Image by Author)

How does the new pipeline work?

Data scientists work in a research environment on their notebooks (we use Zeppelin) to explore and develop their models. Once they’ve figured out the model they want to write, they go to the model’s project in our source control and start developing the required scripts — the model’s inference service, the training script, and the validation script.

Using Pycharm, the data scientists can work locally (with local Spark) or in front of a remote EMR cluster to run and test their scripts with all the data they’re used to working with in the research environment. They can also easily write unit tests for any of the model parts (serving, training, validation, etc.).

We chose to use git tags to trigger the model’s CI/CD, so that when they put a tag with the new version number — it will trigger the CI/CD process that runs the following:

  1. Run all the tests in the project.
  2. If passed — copy the training and validation scripts to a specific S3 bucket (referred to hereon as models_bucket) under the path /<model_name>/<version_from_tag>.
  3. Deploy the model service.
  4. Call mlflow.create_registered_model passing the name for the model as _<model_name>v<version_from_tag> (i.e. my_model_v1).

#data-science #mlops #serverless