Creating and Deploying a Python Machine Learning Service

Creating and Deploying a Python Machine Learning Service

In this tutorial will help you deploy your own machine learning models and apps. Build a hate speech detection system with scikit-learn and deploy it via Docker on Heroku.

Introduction

Imagine you’re the moderator of a message board or comment section. You don’t want to read everything your users write online, yet you want to be alerted in case a discussion turns sour or people start spewing racial slurs all over the place. So, you decide to build yourself an automated system for hate speech detection.

Text classification via machine learning is an obvious choice of technology. However, turning model prototypes into working services has proven to be a widespread challenge. To help bridge this gap, this four-step tutorial illustrates an exemplary deployment workflow for a hate speech detection app:

  1. Train and persist a prediction model with scikit-learn
  2. Create an API endpoint with firefly
  3. Create a Docker container for this service
  4. Deploy the container on Heroku

The code for this project is available here.


1. Create prediction model

Dataset

The approach is based on the paper Automated Hate Speech Detection and the Problem of Offensive Language by Davidson, Warmsley, Macy and Weber. Their results are based on more than 20 000 labelled tweets, which are available on the corresponding Github page.

The .csv file is loaded as a dataframe:

import pandas as pd
import re

df = pd.read_csv('labeled_data.csv', usecols=['class', 'tweet'])

df['tweet'] = df['tweet'].apply(lambda tweet: re.sub('[^A-Za-z]+', ' ', tweet.lower()))

The last line cleans the tweet column by converting all text to lowercase and removing non-alphabetic characters.

Result

The class attribute can assume three category values: 0 for hate speech, 1 for offensive language and 2 for neither.


Model training

We have to convert our predictors, i.e. the tweet text, into a numeric representation before we can train a machine learning classifier. We can use scikit-learn’s TfidfVectorizer for this task, which transforms texts into a matrix of term-frequency times inverse document-frequency (tf-idf) values, suitable for machine learning. Additionally, we can remove stop words (common words such as the, is, …) from the processing.

For text classification, support vector machines (SVMs) are a reliable choice. As they are binary classifiers, we will use a One-Vs-Rest strategy, where for each category an SVM is trained to separate this category from all others.

Both text vectorization and SVM training can be performed in one command by using scikit-learn’s Pipeline feature and defining the respective steps:

from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from stop_words import get_stop_words

clf = make_pipeline(
TfidfVectorizer(stop_words=get_stop_words('en')),
OneVsRestClassifier(SVC(kernel='linear', probability=True))
)

clf = clf.fit(X=df['tweet'], y=df['class'])

Now, the performance of the model should be evaluated, e.g. using a cross-validation approach to calculate classification metrics. However, as this tutorial focusses on model deployment, we will skip this step (never do this in an actual project). The same goes for parameter tuning or additional techniques of natural language processing which are described in the original paper.


Test the model

We can now try a test text and have the model predict the probabilities:

text = "I hate you, please die!"
clf.predict_proba([text.lower()])

Output:

array([0.64, 0.14, 0.22])

The numbers in the array correspond to the probabilities for the three categories (hate speech, offensive language, neither).


Model persistence

Using the joblib module, we can save the model as a binary object to disk. This will allow us to load and use the model in an application.

from sklearn import externals

model_filename = 'hatespeech.joblib.z'
externals.joblib.dump(clf, model_filename)

2. Create REST API

Create API endpoint

The python file app.py loads the model and defines a simple module-level function which wraps the call to the model’s predict_proba function:

from sklearn import externals

model_filename = 'hatespeech.joblib.z'
clf = externals.joblib.load(model_filename)

def predict(text):
probas = clf.predict_proba([text.lower()])[0]
return {'hate speech': probas[0],
'offensive language': probas[1],
'neither': probas[2]}

Now, we use firefly, a lightweight python module for function as a service. For advanced configuration or use in a production environment, Flask or Falcon might be a better choice as they’re well established with a large community. For rapid prototyping, we’re fine with firefly.

We’ll use firefly on the command line to bind the predict function to port 5000 on localhost:

$ firefly app.predict --bind 127.0.0.1:5000

Test API locally

Via curl, we can make a POST request to the created endpoint and obtain a prediction:

$ curl -d '{"text": "Please respect each other."}' \ http://127.0.0.1:5000/predict

Output:

{"hate speech": 0.04, "offensive language": 0.31, "neither": 0.65}

Of course, in a full-fledged real application there would be much more additional features (logging, input and output validation, exception handling, …) and work steps (documentation, versioning, testing, monitoring, …), but here we’re merely deploying a simple prototype.

3. Create a Docker container

Why Docker? A Docker container runs an application in an isolated environment, with all dependencies included, and can be shipped as an image, thus simplifying service setup and scaling.


Build image

We have to configure the contents and start-actions of our container in a file named Dockerfile:

FROM python:3.6
RUN pip install scikit-learn==0.20.2 firefly-python==0.1.15
COPY app.py hatespeech.joblib.z ./

CMD firefly app.predict --bind 0.0.0.0:5000
EXPOSE 5000

The first three lines are about taking python:3.6 as base image, additionally installing scikit-learn and firefly (the same versions as in the development environment) and copying the app and model files inside. The latter two lines tell Docker the command which is executed when a container is started and that port 5000 should be exposed.

The build process that creates the image hatespeechdetect is started via:

$ docker build . -t hatespeechdetect

Run Container

The run command starts a container, derived from an image. Additionally, we’re binding the containers’s port 5000 to the host’s port 3000 via the -p option:

$ docker run -p 3000:5000 -d hatespeechdetect

Use prediction service

Now, we can send a request and obtain a prediction:

$ curl -d '{"text": "You are fake news media! Crooked!"}' \ http://127.0.0.1:3000/predict

Output:

{"hate speech": 0.08, "offensive language": 0.76, "neither": 0.16}

In this example, the container runs locally. Of course the actual purpose is to keep it running at a permanent location, and possibly scale the service by starting multiple containers in an enterprise cluster.

4. Deploy as an Heroku app

A way to make the app publicly available to others is using a platform as a service such as Heroku, which supports Docker and offers a free basic membership. To use it, we have to register an account and install the Heroku CLI.

Heroku’s application containers expose a dynamic port, which requires an edit in our Dockerfile: We have to change port 5000 to the environment variable PORT:

CMD firefly app.predict --bind 0.0.0.0:$PORT

After this change, we are ready for deployment. On the command line, we log in to heroku (which will prompt us for credentials in the browser) and create an app named hate-speech-detector:

$ heroku login

$ heroku create hate-speech-detector

Then we log in to the container registry. heroku container:push will build an image based on the Dockerfile in the current directory and send it to the Heroku Container registry. After that, we can release the image to the app:

$ heroku container:login

$ heroku container:push web --app hate-speech-detector

$ heroku container:release web --app hate-speech-detector

As before, the API can be addressed via curl. However, this time, the service is not running locally, but is available to the world!

$ curl -d ‘{“text”: “You dumb idiot!”}’ https://hate-speech-detector.herokuapp.com/predict

Output:

{"hate speech": 0.26, "offensive language": 0.68, "neither": 0.06}

Now, scaling the app would be just a few clicks or commands away. Also, the service needs to be connected to the message board, the trigger threshold needs to be set and an alerting implemented.

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python for Data Science and Machine Learning Bootcamp

Machine Learning, Data Science and Deep Learning with Python

Deep Learning A-Z™: Hands-On Artificial Neural Networks

Artificial Intelligence A-Z™: Learn How To Build An AI

A Complete Machine Learning Project Walk-Through in Python

Machine Learning: how to go from Zero to Hero

Top 18 Machine Learning Platforms For Developers

10 Amazing Articles On Python Programming And Machine Learning

100+ Basic Machine Learning Interview Questions and Answers

Machine Learning, Data Science and Deep Learning with Python

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on Machine Learning tutorial with Data Science, Tensorflow, Artificial Intelligence, and Neural Networks. Introducing Tensorflow, Using Tensorflow, Introducing Keras, Using Keras, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Learning Deep Learning, Machine Learning with Neural Networks, Deep Learning Tutorial with Python

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on Machine Learning tutorial with Data Science, Tensorflow, Artificial Intelligence, and Neural Networks

Explore the full course on Udemy (special discount included in the link): http://learnstartup.net/p/BkS5nEmZg

In less than 3 hours, you can understand the theory behind modern artificial intelligence, and apply it with several hands-on examples. This is machine learning on steroids! Find out why everyone’s so excited about it and how it really works – and what modern AI can and cannot really do.

In this course, we will cover:
• Deep Learning Pre-requistes (gradient descent, autodiff, softmax)
• The History of Artificial Neural Networks
• Deep Learning in the Tensorflow Playground
• Deep Learning Details
• Introducing Tensorflow
• Using Tensorflow
• Introducing Keras
• Using Keras to Predict Political Parties
• Convolutional Neural Networks (CNNs)
• Using CNNs for Handwriting Recognition
• Recurrent Neural Networks (RNNs)
• Using a RNN for Sentiment Analysis
• The Ethics of Deep Learning
• Learning More about Deep Learning

At the end, you will have a final challenge to create your own deep learning / machine learning system to predict whether real mammogram results are benign or malignant, using your own artificial neural network you have learned to code from scratch with Python.

Separate the reality of modern AI from the hype – by learning about deep learning, well, deeply. You will need some familiarity with Python and linear algebra to follow along, but if you have that experience, you will find that neural networks are not as complicated as they sound. And how they actually work is quite elegant!

This is hands-on tutorial with real code you can download, study, and run yourself.

Best Python Libraries For Data Science & Machine Learning

Best Python Libraries For Data Science & Machine Learning

Best Python Libraries For Data Science & Machine Learning | Data Science Python Libraries

This video will focus on the top Python libraries that you should know to master Data Science and Machine Learning. Here’s a list of topics that are covered in this session:

  • Introduction To Data Science And Machine Learning
  • Why Use Python For Data Science And Machine Learning?
  • Python Libraries for Data Science And Machine Learning
  • Python libraries for Statistics
  • Python libraries for Visualization
  • Python libraries for Machine Learning
  • Python libraries for Deep Learning
  • Python libraries for Natural Language Processing

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading about Python

Complete Python Bootcamp: Go from zero to hero in Python 3

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python and Django Full Stack Web Developer Bootcamp

Complete Python Masterclass

Python Tutorial - Python GUI Programming - Python GUI Examples (Tkinter Tutorial)

Computer Vision Using OpenCV

OpenCV Python Tutorial - Computer Vision With OpenCV In Python

Python Tutorial: Image processing with Python (Using OpenCV)

A guide to Face Detection in Python

Machine Learning Tutorial - Image Processing using Python, OpenCV, Keras and TensorFlow

PyTorch Tutorial for Beginners

The Pandas Library for Python

Introduction To Data Analytics With Pandas


Python Programming for Data Science and Machine Learning

Python Programming for Data Science and Machine Learning

This article provides an overview of Python and its application to Data Science and Machine Learning and why it is important.

Originally published by Chris Kambala  at dzone.com

Python is a general-purpose, high-level, object-oriented, and easy to learn programming language. It was created by Guido van Rossum who is known as the godfather of Python.

Python is a popular programming language because of its simplicity, ease of use, open source licensing, and accessibility — the foundation of its renowned community, which provides great support and help in creating tons of packages, tutorials, and sample programs.

Python can be used to develop a wide variety of applications — ranging from Web, Desktop GUI based programs/applications to science and mathematics programs, and Machine learning and other big data computing systems.

Let’s explore the use of Python in Machine Learning, Data Science, and Data Engineering.

Machine Learning

Machine learning is a relatively new and evolving system development paradigm that has quickly become a mandatory requirement for companies and programmers to understand and use. See our previous article on Machine Learning for the background. Due to the complex, scientific computing nature of machine learning applications, Python is considered the most suitable programming language. This is because of its extensive and mature collection of mathematics and statistics libraries, extensibility, ease of use and wide adoption within the scientific community. As a result, Python has become the recommended programming language for machine learning systems development.

Data Science

Data science combines cutting edge computer and storage technologies with data representation and transformation algorithms and scientific methodology to develop solutions for a variety of complex data analysis problems encompassing raw and structured data in any format. A Data Scientist possesses knowledge of solutions to various classes of data-oriented problems and expertise in applying the necessary algorithms, statistics, and mathematic models, to create the required solutions. Python is recognized among the most effective and popular tools for solving data science related problems.

Data Engineering

Data Engineers build the foundations for Data Science and Machine Learning systems and solutions. Data Engineers are technology experts who start with the requirements identified by the data scientist. These requirements drive the development of data platforms that leverage complex data extraction, loading, and transformation to deliver structured datasets that allow the Data Scientist to focus on solving the business problem. Again, Python is an essential tool in the Data Engineer’s toolbox — one that is used every day to architect and operate the big data infrastructure that is leveraged by the data scientist.

Use Cases for Python, Data Science, and Machine Learning

Here are some example Data Science and Machine Learning applications that leverage Python.

  • Netflix uses data science to understand user viewing pattern and behavioral drivers. This, in turn, helps Netflix to understand user likes/dislikes and predict and suggest relevant items to view.
  • Amazon, Walmart, and Target are heavily using data science, data mining and machine learning to understand users preference and shopping behavior. This assists in both predicting demands to drive inventory management and to suggest relevant products to online users or via email marketing.
  • Spotify uses data science and machine learning to make music recommendations to its users.
  • Spam programs are making use of data science and machine learning algorithm(s) to detect and prevent spam emails.

This article provided an overview of Python and its application to Data Science and Machine Learning and why it is important.

Originally published by Chris Kambala  at dzone.com

============================================

Thanks for reading :heart: If you liked this post, share it with all of your programming buddies! Follow me on Facebook | Twitter

Learn More

☞ Jupyter Notebook for Data Science

☞ Data Science, Deep Learning, & Machine Learning with Python

☞ Deep Learning A-Z™: Hands-On Artificial Neural Networks

☞ Machine Learning A-Z™: Hands-On Python & R In Data Science

☞ Python for Data Science and Machine Learning Bootcamp

☞ Machine Learning, Data Science and Deep Learning with Python

☞ [2019] Machine Learning Classification Bootcamp in Python

☞ Introduction to Machine Learning & Deep Learning in Python

☞ Machine Learning Career Guide – Technical Interview

☞ Machine Learning Guide: Learn Machine Learning Algorithms

☞ Machine Learning Basics: Building Regression Model in Python

☞ Machine Learning using Python - A Beginner’s Guide