Finding trending Meetup topics using Streaming Data, Named Entity Recognition and Zeppelin Notebooks . We started out as a working group from bigdata.ro. The team was comprised of Valentina Crisan, Ovidiu Podariu, Maria Catana, Cristian Stanciulescu, Edwin Brinza and me, Andrei Deusteanu. Our main purpose was to learn and practice on Spark Structured Streaming, Machine Learning and Kafka. We designed the entire use case and then built the architecture from scratch.
We started out as a working group from bigdata.ro. The team was comprised of Valentina Crisan, Ovidiu Podariu, Maria Catana, Cristian Stanciulescu, Edwin Brinza and me, Andrei Deusteanu. Our main purpose was to learn and practice on Spark Structured Streaming, Machine Learning and Kafka. We designed the entire use case and then built the architecture from scratch.
Since Meetup.com provides data through a real-time API, we used it as our main data source We did not use the data for commercial purposes, just for testing.
This is a learning case story. We did not really know from the beginning what would be possible or not. Looking back, some of the steps could have been done better. But, hey, that’s how life works in general.
The problems we tried to solve:
For this we developed 2 sets of visualizations:
The first 2 elements are common in both sets of visualizations. This is the part that reads data from the Meetup.com API and saves it in 2 Kafka Topics.
3. The Spark ML — NER Annotator reads data from the Kafka topic events and then applies a Named Entity Recognition Pipeline with Spark NLP. Finally it saves the annotated data in the Kafka topic TOPIC_KEYWORDS. The Notebook with the code can be found here.
4. Using KSQL we create 2 subsequent streams to transform the data and finally 1 table that will be used by Spark for the visualization. In Big Data Architectures, SQL Engines only build a logical object that assign metadata to the physical layer objects. In our case these were the streams we built on top of the topics. We link data from the TOPIC_KEYWORDS to a new stream via KSQL, called KEYWORDS. Then, using a Create as Select, we create a new stream, EXPLODED_KEYWORDS, for exploding the data since all of the keywords were in an array. Now we have 1 row for each keyword. Next on, we count the occurrences of each keyword and save it into a table, KEYWORDS_COUNTED. The steps to set up the streams and the tables with the KSQL code can be found here: Kafka — Detailed Architecture.
An extensively researched list of top microsoft big data analytics and solution with ratings & reviews to help find the best Microsoft big data solutions development companies around the world.
‘Data is the new science. Big Data holds the key answers’ - Pat Gelsinger The biggest advantage that the enhancement of modern technology has brought
We need no rocket science in understanding that every business, irrespective of their size in the modern-day business world, needs data insights for its expansion. Big data analytics is essential when it comes to understanding the needs and wants of a significant section of the audience.
In this article, see the role of big data in healthcare and look at the new healthcare dynamics. Big Data is creating a revolution in healthcare, providing better outcomes while eliminating fraud and abuse, which contributes to a large percentage of healthcare costs.
Big Data Analytics is the next big thing in business, and it is a reality that is slowly dawning amongst companies. With this article, we have tried to show you the importance of Big Data in business and urge you to take advantage of this immense...