The full code is available on GitHub, Colab, and Kaggle.
This article is about analysing tweets starting from collecting twitter data (NO API, NO LIMITATIONS!) till visualizing tweets with PCA and K-Means in less than 100 lines of code (hopefully!) using some lesser-known libraries like Twint, TextHero, and SweetViz!
TextHero, Twint, SweetViz
Let’s start right away.
Let’s collect data from twitter using twint library.
Question 1: Why are we using twint instead of Twitter’s Official API?
Ans: Because twint requires no authentication, no API, and importantly no limits.
to install on google colab
!pip3 install twint
import twint
# Create a function to scrape a user's account.
def scrape_user():
print ("Fetching Tweets")
c = twint.Config()
# choose username (optional)
c.Username = input('Username: ') # I used a different account for this project. Changed the username to protect the user's privacy.
# choose beginning time (narrow results)
c.Since = input('Date (format: "%Y-%m-%d %H:%M:%S"): ')
# no idea, but makes the csv format properly
c.Store_csv = True
# file name to be saved as
c.Output = input('File name: ')
twint.run.Search(c)
# run the above function
scrape_user()
print('Scraping Done!')
view raw
install_twint.py hosted with ❤ by GitHub
# pandas to read our csv file
import pandas as pd
# save the csv file into a dataframe 'df'
df = pd.read_csv('/content/elonmusk.csv',low_memory=False, parse_dates=[['date', 'time']])
# make a copy if you need so that the changes made in original df doesn't affect the copy
df_copy = df.copy(deep=True)
print('-----------------------------------------------------------------------------')
print('Dataframe:')
print('-----------------------------------------------------------------------------')
# check the whole df
display(df)
print('-----------------------------------------------------------------------------')
print('Dataframe Info: ')
print('-----------------------------------------------------------------------------')
# check an overview of the df
display(df.info())
print('-----------------------------------------------------------------------------')
print('Dataframe details: ')
print('-----------------------------------------------------------------------------')
# gives out quick analysis, notice the max retweets_count and min retweets_count and so on
display(df.describe())
view raw
read_data.py hosted with ❤ by GitHub
#twitter #nlp #text-analysis #twint #data analysis