At the heart of any sentiment analysis project is a good set of labeled data. Pre-labeled datasets can be found on various sites all over the internet. But…

  • What if you have come up with a custom dataset that has no labels?
  • What if you have to provide those labels before proceeding with your project?
  • What if you are not willing to pay to outsource the task of labeling?

I was recently faced with this very issue while retrieving text data from the Twitter Streaming API for a sentiment analysis project. I quickly discovered annotating the data myself would be a painful task without a good tool. This was the inspiration behind building tortus, a tool that makes it easy to label your text data within a Jupyter Notebook!

Begin by installing tortus and enabling the appropriate nbextentions.

$ pip install tortus
$ jupyter nbextension enable --py widgetsnbextension

After opening a Jupyter Notebook, read your data into a pandas dataframe.

import pandas as pd
movie_reviews = pd.read_csv('movie_reviews.csv')

Import the package and create an instance of Tortus.

from tortus import Tortus

tortus = Tortus(df, text, num_records=10, id_column=None, annotations=None, random=True, labels=['Positve', 'Negative', 'Neutral'])

#annotation-tools #sentiment-analysis #jupyter-notebook #nlp #data-science

Easy Text Annotation in a Jupyter Notebook
11.85 GEEK