A Big Data Analysis of Meetup Events using Spark NLP, Kafka and Vegas

A Big Data Analysis of Meetup Events using Spark NLP, Kafka and Vegas

Finding trending Meetup topics using Streaming Data, Named Entity Recognition and Zeppelin Notebooks . We started out as a working group from bigdata.ro. The team was comprised of Valentina Crisan, Ovidiu Podariu, Maria Catana, Cristian Stanciulescu, Edwin Brinza and me, Andrei Deusteanu. Our main purpose was to learn and practice on Spark Structured Streaming, Machine Learning and Kafka. We designed the entire use case and then built the architecture from scratch.

We started out as a working group from bigdata.ro. The team was comprised of Valentina CrisanOvidiu PodariuMaria Catana, Cristian Stanciulescu, Edwin Brinza and me, Andrei Deusteanu. Our main purpose was to learn and practice on Spark Structured Streaming, Machine Learning and Kafka. We designed the entire use case and then built the architecture from scratch.

Since Meetup.com provides data through a real-time API, we used it as our main data source We did not use the data for commercial purposes, just for testing.

This is a learning case story. We did not really know from the beginning what would be possible or not. Looking back, some of the steps could have been done better. But, hey, that’s how life works in general.

The problems we tried to solve:

  • Allow meetup organizers to identify trending topics related to their meetup. We computed Trending Topics based on the description of the events matching the tags of interest to us. We did this using the John Snow Labs Spark NLP library for extracting entities.
  • Determine which Meetup events attract the most responses within our region. Therefore we monitored the RSVPs for meetups based on certain tags, related to our domain of interest — Big Data.

For this we developed 2 sets of visualizations:

  • Trending Keywords
  • RSVPs Distribution

Architecture

The first 2 elements are common in both sets of visualizations. This is the part that reads data from the Meetup.com API and saves it in 2 Kafka Topics.

  1. The Stream Reader script fetches data on Yes RSVPs filtered by certain tags from the Meetup Stream API. It then selects the relevant columns that we need. After that it saves this data into the rsvps_filtered_stream Kafka topic.
  2. For each RSVP, the Stream Reader script then fetches event data for it, only if the event_id does not exist in the events.idx file. This way we make sure that we read event data only once. The setup for the Stream Reader script can be found -> Install Kafka and fetch RSVPs

3. The Spark ML — NER Annotator reads data from the Kafka topic events and then applies a Named Entity Recognition Pipeline with Spark NLP. Finally it saves the annotated data in the Kafka topic TOPIC_KEYWORDS. The Notebook with the code can be found here.

4. Using KSQL we create 2 subsequent streams to transform the data and finally 1 table that will be used by Spark for the visualization. In Big Data Architectures, SQL Engines only build a logical object that assign metadata to the physical layer objects. In our case these were the streams we built on top of the topics. We link data from the TOPIC_KEYWORDS to a new stream via KSQL, called KEYWORDS. Then, using a Create as Select, we create a new stream, EXPLODED_KEYWORDS, for exploding the data since all of the keywords were in an array. Now we have 1 row for each keyword. Next on, we count the occurrences of each keyword and save it into a table, KEYWORDS_COUNTED. The steps to set up the streams and the tables with the KSQL code can be found here: Kafka — Detailed Architecture.

kafka nlp spark big-data meetup

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Top Microsoft big data solutions Companies | Best Microsoft big data Developers

An extensively researched list of top microsoft big data analytics and solution with ratings & reviews to help find the best Microsoft big data solutions development companies around the world.

Silly mistakes that can cost ‘Big’ in Big Data Analytics

‘Data is the new science. Big Data holds the key answers’ - Pat Gelsinger The biggest advantage that the enhancement of modern technology has brought

Big Data can be The ‘Big’ boon for The Modern Age Businesses

We need no rocket science in understanding that every business, irrespective of their size in the modern-day business world, needs data insights for its expansion. Big data analytics is essential when it comes to understanding the needs and wants of a significant section of the audience.

Role of Big Data in Healthcare - DZone Big Data

In this article, see the role of big data in healthcare and look at the new healthcare dynamics. Big Data is creating a revolution in healthcare, providing better outcomes while eliminating fraud and abuse, which contributes to a large percentage of healthcare costs.

How you’re losing money by not opting for Big Data Services?

Big Data Analytics is the next big thing in business, and it is a reality that is slowly dawning amongst companies. With this article, we have tried to show you the importance of Big Data in business and urge you to take advantage of this immense...