Businesses today can benefit in real-time from the data they continuously generate at massive scale and speed from various data sources. Whether it is clickstream data from websites, telemetry data from IoT devices or log data from applications, continuously analysing that data can help businesses learn what their customers, applications and products are doing right now and react promptly.
In this article, we want to explore how Amazon Kinesis Firehose can be used to make new data available in a Snowflake data warehouse in real-time rather than using traditional batch processing over long intervals. The motivation is to help businesses understand how they can build the foundation for powerful real-time applications and analytics use cases.
For those that may not be familiar with these technologies, we’ll provide a very quick summary, but our main focus for this post is demonstrating how to code a simple application that sends data to Snowflake. For this demonstration we’ll be using a Twitter API to stream real-time tweets.
Snowflake is a cloud-native, fully relational ANSI SQL data warehouse service available in both AWS and Azure. It provides a consumption-based usage model with unlimited scalability. It’s capable of loading both structured and semi-structured data like JSON, Avro or XML.
Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture, transform and load streaming data into Amazon Kinesis Analytics, AWS S3, AWS Redshift and AWS Elasticsearch Service. It’s a fully managed service that automatically scales to match the throughput of your data. It can also batch, compress and encrypt the data before loading it.
Tweets are posted on Twitter continuously so it’s a great source for us to use as a real-time stream. For this example, we’ll be using Twitter’s streaming API to get new tweets from Twitter as they are posted. To do this, we must create a persistent connection to the Twitter API and read each connection incrementally, then process each tweet quickly so that our program doesn’t get backed up.
We will build a simple python program to call the Tweets. Before you can do this you must set up your Twitter developer account and create an app. First go to Twitter developer center and create a developer account. Then go to the application console and create a new Twitter application. This will let you get specific credentials for the application to connect to the API.
Next, let’s write a simple app to call the Twitter API with a persistent session. The streaming API is different from a REST API because it pushes messages to a persistent session while a REST API is used to pull data. This allows streaming API to push data in real-time as it becomes available.
To setup our application we will install a python library called Tweepy.
#aws-kinesis #real-time-analytics #analytics #aws #snowflake #data analytic