1560916679
Imagine you’re the moderator of a message board or comment section. You don’t want to read everything your users write online, yet you want to be alerted in case a discussion turns sour or people start spewing racial slurs all over the place. So, you decide to build yourself an automated system for hate speech detection.
Text classification via machine learning is an obvious choice of technology. However, turning model prototypes into working services has proven to be a widespread challenge. To help bridge this gap, this four-step tutorial illustrates an exemplary deployment workflow for a hate speech detection app:
The code for this project is available here.
The approach is based on the paper Automated Hate Speech Detection and the Problem of Offensive Language by Davidson, Warmsley, Macy and Weber. Their results are based on more than 20 000 labelled tweets, which are available on the corresponding Github page.
The .csv file is loaded as a dataframe:
import pandas as pd import redf = pd.read_csv(‘labeled_data.csv’, usecols=[‘class’, ‘tweet’])
df[‘tweet’] = df[‘tweet’].apply(lambda tweet: re.sub(‘[^A-Za-z]+’, ’ ', tweet.lower()))
The last line cleans the tweet column by converting all text to lowercase and removing non-alphabetic characters.
The class attribute can assume three category values: 0
for hate speech, 1
for offensive language and 2
for neither.
We have to convert our predictors, i.e. the tweet text, into a numeric representation before we can train a machine learning classifier. We can use scikit-learn’s TfidfVectorizer for this task, which transforms texts into a matrix of term-frequency times inverse document-frequency (tf-idf) values, suitable for machine learning. Additionally, we can remove stop words (common words such as the, is, …) from the processing.
For text classification, support vector machines (SVMs) are a reliable choice. As they are binary classifiers, we will use a One-Vs-Rest strategy, where for each category an SVM is trained to separate this category from all others.
Both text vectorization and SVM training can be performed in one command by using scikit-learn’s Pipeline feature and defining the respective steps:
from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from stop_words import get_stop_wordsclf = make_pipeline(
TfidfVectorizer(stop_words=get_stop_words(‘en’)),
OneVsRestClassifier(SVC(kernel=‘linear’, probability=True))
)clf = clf.fit(X=df[‘tweet’], y=df[‘class’])
Now, the performance of the model should be evaluated, e.g. using a cross-validation approach to calculate classification metrics. However, as this tutorial focusses on model deployment, we will skip this step (never do this in an actual project). The same goes for parameter tuning or additional techniques of natural language processing which are described in the original paper.
We can now try a test text and have the model predict the probabilities:
text = “I hate you, please die!”
clf.predict_proba([text.lower()])Output:
array([0.64, 0.14, 0.22])
The numbers in the array correspond to the probabilities for the three categories (hate speech, offensive language, neither).
Using the joblib module, we can save the model as a binary object to disk. This will allow us to load and use the model in an application.
from sklearn import externalsmodel_filename = ‘hatespeech.joblib.z’
externals.joblib.dump(clf, model_filename)
The python file app.py
loads the model and defines a simple module-level function which wraps the call to the model’s predict_proba function:
from sklearn import externalsmodel_filename = ‘hatespeech.joblib.z’
clf = externals.joblib.load(model_filename)def predict(text):
probas = clf.predict_proba([text.lower()])[0]
return {‘hate speech’: probas[0],
‘offensive language’: probas[1],
‘neither’: probas[2]}
Now, we use firefly, a lightweight python module for function as a service. For advanced configuration or use in a production environment, Flask or Falcon might be a better choice as they’re well established with a large community. For rapid prototyping, we’re fine with firefly.
We’ll use firefly on the command line to bind the predict function to port 5000 on localhost:
$ firefly app.predict --bind 127.0.0.1:5000
Via curl
, we can make a POST request to the created endpoint and obtain a prediction:
$ curl -d ‘{“text”: “Please respect each other.”}’ \ http://127.0.0.1:5000/predictOutput:
{“hate speech”: 0.04, “offensive language”: 0.31, “neither”: 0.65}
Of course, in a full-fledged real application there would be much more additional features (logging, input and output validation, exception handling, …) and work steps (documentation, versioning, testing, monitoring, …), but here we’re merely deploying a simple prototype.
Why Docker? A Docker container runs an application in an isolated environment, with all dependencies included, and can be shipped as an image, thus simplifying service setup and scaling.
We have to configure the contents and start-actions of our container in a file named Dockerfile
:
FROM python:3.6
RUN pip install scikit-learn==0.20.2 firefly-python==0.1.15
COPY app.py hatespeech.joblib.z ./CMD firefly app.predict --bind 0.0.0.0:5000
EXPOSE 5000
The first three lines are about taking python:3.6
as base image, additionally installing scikit-learn and firefly (the same versions as in the development environment) and copying the app and model files inside. The latter two lines tell Docker the command which is executed when a container is started and that port 5000 should be exposed.
The build process that creates the image hatespeechdetect
is started via:
$ docker build . -t hatespeechdetect
The run
command starts a container, derived from an image. Additionally, we’re binding the containers’s port 5000 to the host’s port 3000 via the -p
option:
$ docker run -p 3000:5000 -d hatespeechdetect
Now, we can send a request and obtain a prediction:
$ curl -d ‘{“text”: “You are fake news media! Crooked!”}’ \ http://127.0.0.1:3000/predictOutput:
{“hate speech”: 0.08, “offensive language”: 0.76, “neither”: 0.16}
In this example, the container runs locally. Of course the actual purpose is to keep it running at a permanent location, and possibly scale the service by starting multiple containers in an enterprise cluster.
A way to make the app publicly available to others is using a platform as a service such as Heroku, which supports Docker and offers a free basic membership. To use it, we have to register an account and install the Heroku CLI.
Heroku’s application containers expose a dynamic port, which requires an edit in our Dockerfile
: We have to change port 5000 to the environment variable PORT
:
CMD firefly app.predict --bind 0.0.0.0:$PORT
After this change, we are ready for deployment. On the command line, we log in to heroku (which will prompt us for credentials in the browser) and create an app named hate-speech-detector
:
$ heroku login$ heroku create hate-speech-detector
Then we log in to the container registry. heroku container:push
will build an image based on the Dockerfile in the current directory and send it to the Heroku Container registry. After that, we can release the image to the app:
$ heroku container:login$ heroku container:push web --app hate-speech-detector
$ heroku container:release web --app hate-speech-detector
As before, the API can be addressed via curl. However, this time, the service is not running locally, but is available to the world!
$ curl -d ‘{“text”: “You dumb idiot!”}’ https://hate-speech-detector.herokuapp.com/predictOutput:
{“hate speech”: 0.26, “offensive language”: 0.68, “neither”: 0.06}
Now, scaling the app would be just a few clicks or commands away. Also, the service needs to be connected to the message board, the trigger threshold needs to be set and an alerting implemented.
Thanks for reading ❤
If you liked this post, share it with all of your programming buddies!
Follow us on Facebook | Twitter
☞ Machine Learning A-Z™: Hands-On Python & R In Data Science
☞ Python for Data Science and Machine Learning Bootcamp
☞ Machine Learning, Data Science and Deep Learning with Python
☞ Deep Learning A-Z™: Hands-On Artificial Neural Networks
☞ Artificial Intelligence A-Z™: Learn How To Build An AI
☞ A Complete Machine Learning Project Walk-Through in Python
☞ Machine Learning: how to go from Zero to Hero
☞ Top 18 Machine Learning Platforms For Developers
☞ 10 Amazing Articles On Python Programming And Machine Learning
☞ 100+ Basic Machine Learning Interview Questions and Answers
#python #machine-learning #data-science
1625843760
When installing Machine Learning Services in SQL Server by default few Python Packages are installed. In this article, we will have a look on how to get those installed python package information.
When we choose Python as Machine Learning Service during installation, the following packages are installed in SQL Server,
#machine learning #sql server #executing python in sql server #machine learning using python #machine learning with sql server #ml in sql server using python #python in sql server ml #python packages #python packages for machine learning services #sql server machine learning services
1655630160
Install via pip:
$ pip install pytumblr
Install from source:
$ git clone https://github.com/tumblr/pytumblr.git
$ cd pytumblr
$ python setup.py install
A pytumblr.TumblrRestClient
is the object you'll make all of your calls to the Tumblr API through. Creating one is this easy:
client = pytumblr.TumblrRestClient(
'<consumer_key>',
'<consumer_secret>',
'<oauth_token>',
'<oauth_secret>',
)
client.info() # Grabs the current user information
Two easy ways to get your credentials to are:
interactive_console.py
tool (if you already have a consumer key & secret)client.info() # get information about the authenticating user
client.dashboard() # get the dashboard for the authenticating user
client.likes() # get the likes for the authenticating user
client.following() # get the blogs followed by the authenticating user
client.follow('codingjester.tumblr.com') # follow a blog
client.unfollow('codingjester.tumblr.com') # unfollow a blog
client.like(id, reblogkey) # like a post
client.unlike(id, reblogkey) # unlike a post
client.blog_info(blogName) # get information about a blog
client.posts(blogName, **params) # get posts for a blog
client.avatar(blogName) # get the avatar for a blog
client.blog_likes(blogName) # get the likes on a blog
client.followers(blogName) # get the followers of a blog
client.blog_following(blogName) # get the publicly exposed blogs that [blogName] follows
client.queue(blogName) # get the queue for a given blog
client.submission(blogName) # get the submissions for a given blog
Creating posts
PyTumblr lets you create all of the various types that Tumblr supports. When using these types there are a few defaults that are able to be used with any post type.
The default supported types are described below.
We'll show examples throughout of these default examples while showcasing all the specific post types.
Creating a photo post
Creating a photo post supports a bunch of different options plus the described default options * caption - a string, the user supplied caption * link - a string, the "click-through" url for the photo * source - a string, the url for the photo you want to use (use this or the data parameter) * data - a list or string, a list of filepaths or a single file path for multipart file upload
#Creates a photo post using a source URL
client.create_photo(blogName, state="published", tags=["testing", "ok"],
source="https://68.media.tumblr.com/b965fbb2e501610a29d80ffb6fb3e1ad/tumblr_n55vdeTse11rn1906o1_500.jpg")
#Creates a photo post using a local filepath
client.create_photo(blogName, state="queue", tags=["testing", "ok"],
tweet="Woah this is an incredible sweet post [URL]",
data="/Users/johnb/path/to/my/image.jpg")
#Creates a photoset post using several local filepaths
client.create_photo(blogName, state="draft", tags=["jb is cool"], format="markdown",
data=["/Users/johnb/path/to/my/image.jpg", "/Users/johnb/Pictures/kittens.jpg"],
caption="## Mega sweet kittens")
Creating a text post
Creating a text post supports the same options as default and just a two other parameters * title - a string, the optional title for the post. Supports markdown or html * body - a string, the body of the of the post. Supports markdown or html
#Creating a text post
client.create_text(blogName, state="published", slug="testing-text-posts", title="Testing", body="testing1 2 3 4")
Creating a quote post
Creating a quote post supports the same options as default and two other parameter * quote - a string, the full text of the qote. Supports markdown or html * source - a string, the cited source. HTML supported
#Creating a quote post
client.create_quote(blogName, state="queue", quote="I am the Walrus", source="Ringo")
Creating a link post
#Create a link post
client.create_link(blogName, title="I like to search things, you should too.", url="https://duckduckgo.com",
description="Search is pretty cool when a duck does it.")
Creating a chat post
Creating a chat post supports the same options as default and two other parameters * title - a string, the title of the chat post * conversation - a string, the text of the conversation/chat, with diablog labels (no html)
#Create a chat post
chat = """John: Testing can be fun!
Renee: Testing is tedious and so are you.
John: Aw.
"""
client.create_chat(blogName, title="Renee just doesn't understand.", conversation=chat, tags=["renee", "testing"])
Creating an audio post
Creating an audio post allows for all default options and a has 3 other parameters. The only thing to keep in mind while dealing with audio posts is to make sure that you use the external_url parameter or data. You cannot use both at the same time. * caption - a string, the caption for your post * external_url - a string, the url of the site that hosts the audio file * data - a string, the filepath of the audio file you want to upload to Tumblr
#Creating an audio file
client.create_audio(blogName, caption="Rock out.", data="/Users/johnb/Music/my/new/sweet/album.mp3")
#lets use soundcloud!
client.create_audio(blogName, caption="Mega rock out.", external_url="https://soundcloud.com/skrillex/sets/recess")
Creating a video post
Creating a video post allows for all default options and has three other options. Like the other post types, it has some restrictions. You cannot use the embed and data parameters at the same time. * caption - a string, the caption for your post * embed - a string, the HTML embed code for the video * data - a string, the path of the file you want to upload
#Creating an upload from YouTube
client.create_video(blogName, caption="Jon Snow. Mega ridiculous sword.",
embed="http://www.youtube.com/watch?v=40pUYLacrj4")
#Creating a video post from local file
client.create_video(blogName, caption="testing", data="/Users/johnb/testing/ok/blah.mov")
Editing a post
Updating a post requires you knowing what type a post you're updating. You'll be able to supply to the post any of the options given above for updates.
client.edit_post(blogName, id=post_id, type="text", title="Updated")
client.edit_post(blogName, id=post_id, type="photo", data="/Users/johnb/mega/awesome.jpg")
Reblogging a Post
Reblogging a post just requires knowing the post id and the reblog key, which is supplied in the JSON of any post object.
client.reblog(blogName, id=125356, reblog_key="reblog_key")
Deleting a post
Deleting just requires that you own the post and have the post id
client.delete_post(blogName, 123456) # Deletes your post :(
A note on tags: When passing tags, as params, please pass them as a list (not a comma-separated string):
client.create_text(blogName, tags=['hello', 'world'], ...)
Getting notes for a post
In order to get the notes for a post, you need to have the post id and the blog that it is on.
data = client.notes(blogName, id='123456')
The results include a timestamp you can use to make future calls.
data = client.notes(blogName, id='123456', before_timestamp=data["_links"]["next"]["query_params"]["before_timestamp"])
# get posts with a given tag
client.tagged(tag, **params)
This client comes with a nice interactive console to run you through the OAuth process, grab your tokens (and store them for future use).
You'll need pyyaml
installed to run it, but then it's just:
$ python interactive-console.py
and away you go! Tokens are stored in ~/.tumblr
and are also shared by other Tumblr API clients like the Ruby client.
The tests (and coverage reports) are run with nose, like this:
python setup.py test
Author: tumblr
Source Code: https://github.com/tumblr/pytumblr
License: Apache-2.0 license
1619643600
If you want to become a machine learning professional, you’d have to gain experience using its technologies. The best way to do so is by completing projects. That’s why in this article, we’re sharing multiple machine learning projects in Python so you can quickly start testing your skills and gain valuable experience.
However, before you begin, make sure that you’re familiar with machine learning and its algorithm. If you haven’t worked on a project before, don’t worry because we have also shared a detailed tutorial on one project:
#artificial intelligence #machine learning #machine learning in python #machine learning projects #machine learning projects in python #python
1620898103
Check out the 5 latest technologies of machine learning trends to boost business growth in 2021 by considering the best version of digital development tools. It is the right time to accelerate user experience by bringing advancement in their lifestyle.
#machinelearningapps #machinelearningdevelopers #machinelearningexpert #machinelearningexperts #expertmachinelearningservices #topmachinelearningcompanies #machinelearningdevelopmentcompany
Visit Blog- https://www.xplace.com/article/8743
#machine learning companies #top machine learning companies #machine learning development company #expert machine learning services #machine learning experts #machine learning expert
1620367500
If you want to become a machine learning professional, you’d have to gain experience using its technologies. The best way to do so is by completing projects. That’s why in this article, we’re sharing multiple machine learning projects in Python so you can quickly start testing your skills and gain valuable experience.
However, before you begin, make sure that you’re familiar with machine learning and its algorithm. If you haven’t worked on a project before, don’t worry because we have also shared a detailed tutorial on one project:
The Iris dataset is easily one of the most popular machine learning projects in Python. It is relatively small, but its simplicity and compact size make it perfect for beginners. If you haven’t worked on any machine learning projects in Python, you should start with it. The Iris dataset is a collection of flower sepal and petal sizes of the flower Iris. It has three classes, with 50 instances in every one of them.
We’ve provided sample code on various places, but you should only use it to understand how it works. Implementing the code without understanding it would fail the premise of doing the project. So be sure to understand the code well before implementing it.
#artificial intelligence #machine learning #machine learning in python #machine learning projects #machine learning projects in python #python