In this article, we will build a data science project. We collect data from Twitter because it has an enormous amount of data and it allows us to get those. We prefer Java because it’s a compiled language and it has a strong concurrency library. Finally, we summarise those data by using Gephi which is an open-source graph platform.

Image for post

Graph generated by using Gephi from sample tweets on 9/14/20. Language was TH

We need the followings to do the project:

  • Java IDE. Our choice is Eclipse.
  • Twitter4j libraries. Get the jar files and the tutorial from here.
  • Twitter developer account. We need this in order to call Twitter API. There are a few resources mentioned how to get access.
  • Any JDBC compliant database. We use Sqlite. It’s very light-weighted. No software installation required. No daemon process. Just copy Sqlite jar file into the project. However, there are some limitations which require workarounds.
  • Gephi which is an open source graph tool. Download it from here.

By the way, readers could use whatever language or platform they like, Python or Node.js. Our sample code is in Java.

The followings are steps to build the Twitter network graph:

  • Collect tweets and users and save them into a database.
  • Retrieve users’ friends. From the list of users from the previous step, get friends of those. We will save these into tables
  • Filter for the data we’d like to see in the graph
  • Export the data to CSV files
  • Import CSV files to Gephi. Do some formating, layout. We will get a twitter social graph

#graph-analytics #data-science #twitter-data #java #sql

Building a network graph from Twitter data
3.85 GEEK