The full code is available on GitHubColab, and Kaggle.

This article is about analysing tweets starting from collecting twitter data (NO API, NO LIMITATIONS!) till visualizing tweets with PCA and K-Means in less than 100 lines of code (hopefully!) using some lesser-known libraries like Twint, TextHero, and SweetViz!

Image for post

TextHero, Twint, SweetViz

Table of Contents:

Let’s start right away.

0. Scraping Twitter Data using Twint

Let’s collect data from twitter using twint library.

Question 1: Why are we using twint instead of Twitter’s Official API?

Ans: Because twint requires no authentication, no API, and importantly no limits.

 to install on google colab
	!pip3 install twint

	import twint

	# Create a function to scrape a user's account.
	def scrape_user():
		print ("Fetching Tweets")
		c = twint.Config()
		# choose username (optional)
		c.Username = input('Username: ') # I used a different account for this project. Changed the username to protect the user's privacy.
		# choose beginning time (narrow results)
		c.Since = input('Date (format: "%Y-%m-%d %H:%M:%S"): ')
		# no idea, but makes the csv format properly
		c.Store_csv = True
		# file name to be saved as
		c.Output = input('File name: ')
		twint.run.Search(c)

	# run the above function
	scrape_user()
	print('Scraping Done!')
view raw
install_twint.py hosted with ❤ by GitHub

1. Reading Data using Pandas

# pandas to read our csv file
	import pandas as pd

	# save the csv file into a dataframe 'df'
	df = pd.read_csv('/content/elonmusk.csv',low_memory=False, parse_dates=[['date', 'time']])

	# make a copy if you need so that the changes made in original df doesn't affect the copy
	df_copy = df.copy(deep=True)

	print('-----------------------------------------------------------------------------')
	print('Dataframe:')
	print('-----------------------------------------------------------------------------')
	# check the whole df
	display(df)
	print('-----------------------------------------------------------------------------')
	print('Dataframe Info: ')
	print('-----------------------------------------------------------------------------')
	# check an overview of the df
	display(df.info())
	print('-----------------------------------------------------------------------------')
	print('Dataframe details: ')
	print('-----------------------------------------------------------------------------')
	# gives out quick analysis, notice the max retweets_count and min retweets_count and so on
	display(df.describe())
view raw
read_data.py hosted with ❤ by GitHub

#twitter #nlp #text-analysis #twint #data analysis

 100 lines of code using Twint, TextHero, SweetViz Python Libraries!
41.65 GEEK