Zero-Shot Text Classification with Hugging Face

A few weeks ago I was implementing POC with one of the requirements to be able to detect text sentiment in an unsupervised way (without having training data in advance and building a model). More specifically it was about data extraction. Based on some predefined topics, my task was to automate information extraction from text data. While doing research and checking for the best ways to solve this problem, I found out that Hugging Face NLP supports zero-shot text classification.

What is zero-shot text classification? Check this post — Zero-Shot Learning in Modern NLP. There is a live demo from Hugging Face team, along with a sample Colab notebook. In simple words, zero-shot model allows us to classify data, which wasn’t used to build a model. What I mean here — the model was built by someone else, we are using it to run against our data.

I thought it would be a useful example, where I fetch Twitter messages and run classification to group messages into topics. This can be used as a starting point for more complex use cases.

I’m using GetOldTweets3 library to scrap Twitter messages. Zero-shot classification with transformers is straightforward, I was following Colab example provided by Hugging Face.

List of imports:

import GetOldTweets3 as got
import pandas as pd
from tqdm import tqdm
import matplotlib.pyplot as plt
import seaborn as sns

from transformers import pipeline

#machine-learning #transformers #python #hugging-face #nlp

towardsdatascience.com

Zero-Shot Text Classification with Hugging Face