The data format is not always on tabular format. As we are getting into the big data era, the data comes with a pretty diverse format, including images, texts, graphs, and many more.

Because the format is pretty diverse, ranging from one data to another, it’s really essential to preprocess those data into a readable format to computers.

In this article, I want to show you on how to preprocess texts data using Python. As mention on the title, all you need is NLTK and re library.

To show you how this work, I will take a dataset from a Kaggle competition called Real or Not? NLP with Disaster Tweets

#data-analysis #artificial-intelligence #machine-learning #data-science #python

How to Clean Text Data with Python
3.20 GEEK