Redis Streams in Action: Part 1 (Intro and overview)

Redis Streams in Action: Part 1 (Intro and overview)

Welcome to this series of blog posts which covers Redis Streams with the help of a practical example. We will use a sample application to make Twitter data available for search and query in real-time. RediSearch and Redis Streams serve as the backbone of this solution that consists of several co-operating components, each of which will we covered in a dedicated blog post.

Welcome to this series of blog posts which covers Redis Streams with the help of a practical example. We will use a sample application to make Twitter data available for search and query in real-time. RediSearch and Redis Streams serve as the backbone of this solution that consists of several co-operating components, each of which will we covered in a dedicated blog post.

  • Part 1 — this blog
  • Part 2
  • Part 3 — coming soon
  • Part 4 — coming soon

_The code is available in this GitHub repo — [https://github.com/abhirockzz/redis-streams-in-action_](https://github.com/abhirockzz/redis-streams-in-action)

This is the first part which explores the use case, motivations and provides a high level overview of the Redis features used in the solution.

Solution Architecture

High level architecture

The use case is relatively simple. As an end goal, we want to have a service that allows us to search for tweets based on some criteria such as hashtags, user, location etc. Of course, there are existing solutions for this. The one presented in this blog series is an example scenario and can be applied to similar problems.

Here is a summary of the individual components:

  1. Twitter Stream Consumer: A Rust application to consume streaming Twitter data and pass them on to Redis Streams. I will demonstrate how to run this as a Docker container in Azure Container Instances
  2. Tweets Processor: The tweets from Redis Streams are processed by a Java application — this too will be deployed (and scaled) using Azure Container Instances.
  3. Monitoring service: The last part is a Go application to monitor the progress of the tweets processor service and ensure that any failed records are re-processed. This is a Serverless component which will be deployed to Azure Functions where you can run it based on a Timer trigger and only pay for the duration it runs for.

I have used a few Azure services (including Enterprise tier of Azure Cache for Redis that supports Redis modules such as RediSearchRedisTimeSeries and Redis Bloom) to run different parts of the solution, but you can tweak the instructions a little bit and apply them as per your environment e.g. you can use use Docker to run everything locally! Although the individual services have been written in different programming languages, the same concepts apply (in terms of Redis Streams, RediSearch, scalability etc.) and can be implemented in the language of your choice.

The “Need for scale”

I had written a blog post RediSearch in Action that covered the same use case i.e. how to implement a set of applications for consuming tweets in real-time, index them in RediSearch and query them using a REST API. However, the solution presented here has been implemented with the help of Redis Streams along with other components in order to make the architecture scalable and fault-tolerant. In this specific example, it’s the ability to process large volume of tweets, but the same idea can be extended/applied to other use-cases which deal with high velocity data e.g. IoT, log analytics, etc. Such problems benefit from an architecture where you can horizontally scale out your applications to handle increasing data volumes. Typically, this involves introducing a Messaging system to act a buffer between producers and consumers. Since this is a common requirement and the problem space is well understood, there are lot of established solutions in the distributed messaging world ranging from JMS (Java Messaging Service), Apache KafkaRabbitMQNATS, and of course Redis.

Lots of options in Redis!

There is something unique about Redis though. From a messaging point of view, Redis is quite flexible since it provides multiple options to support different paradigms, hence serving a wide range of use cases. It’s features include Pub-SubLists (worker queue approach) and Redis Streams. Since this blog series is focuses on Redis Streams, I will provide a quick over view of the other possibilities before moving on.

  • Pub-Sub: it follows a based broadcast paradigm where multiple receivers can consume messages sent to a specific channel. Producers and consumers are completely decoupled, but note that there is no concept of message persistence i.e. if a consumer app is not up and running, it does not get those messages when it comes back on later.
  • Lists: they allow us to adopt a worker-queue based approach which can distribute load among worker apps. the messages are removed once they are consumed. it can provide some level of fault-tolerance and reliability using RPOPLPUSH (and BRPOPLPUSH)

Redis Streams

Introduced in Redis 5.0, Redis Streams provides the best of Pub/Sub and Lists along with reliable messaging, durability for messages replay, Consumer Groups for load balancing, Pending Entry List for monitoring and much more! What makes it different is that fact it is a append-only log data structure. In a nutshell, producers can add records (using XADD), consumers can subscribe to new items arriving to the stream (with XREAD). It supports range queries (XRANGE etc.) and thanks to consumer groups, a group of apps can distribute the processing load (XREADGROUP) and its possible to monitor its state (XPENDING etc).

Since the magic of Redis lies in its powerful command system, let’s go over some of the Redis Streams commands, grouped by functionality for easier understanding:

Add entries

There is only one way you can add messages to a Redis Stream. XADD appends the specified stream entry to the stream at the specified key. If the key does not exist, as a side effect of running this command the key is created with a stream value.

Read entries

  • XRANGE returns the stream entries matching a given range of IDs (the - and + special IDs mean respectively the minimum ID possible and the maximum ID possible inside a stream)
  • XREVRANGE is exactly like XRANGE, but with the difference of returning the entries in reverse order (use the end ID first and the start ID later)
  • XREAD reads data from one or multiple streams, only returning entries with an ID greater than the last received ID reported by the caller.
  • XREADGROUP is a special version of the XREAD command with support for consumer groups. You can create groups of clients that consume different parts of the messages arriving in a given stream

Manage Redis Streams

  • XACK removes one or multiple messages from the Pending Entries List (PEL) of a stream consumer group.
  • XGROUP is used to manage the consumer groups associated with a Redis stream.
  • XPENDING is the used to inspect the list of pending messages to observe and understand what is happening with a streams consumer groups.
  • XCLAIM is used to acquire the ownership of the message and continue processing.
  • XAUTOCLIAM transfers ownership of pending stream entries that match the specified criteria. Conceptually, XAUTOCLAIM is equivalent to calling XPENDING and then XCLAIM

Delete

  • XDEL removes the specified entries from a stream, and returns the number of entries deleted, that may be different from the number of IDs passed to the command in case certain IDs do not exist.
  • XTRIM trims the stream by evicting older entries (entries with lower IDs) if needed.

For a detailed, I would highly recommend reading [“Introduction to Redis Streams”_](https://redis.io/topics/streams-intro) (from the official Redis docs)._

cloud-computing programming azure redis

What is Geek Coin

What is GeekCash, Geek Token

Best Visual Studio Code Themes of 2021

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Multi-cloud Spending: 8 Tips To Lower Cost

Mismanagement of multi-cloud expense costs an arm and leg to business and its management has become a major pain point. Here we break down some crucial tips to take some of the management challenges off your plate and help you optimize your cloud spend.

AWS vs. Azure vs. Google: Which Is the Best for Cloud Computing?

In the world of cloud technology, there are three vendors that reign supreme, and this article briefly outlines some of the merits and use cases for each. AWS vs. Azure vs. Google: Which Is the Best for Cloud Computing?

Cloud Computing Vs Grid Computing

Cloud Computing Vs Grid Computing: Difference Between Cloud Computing & Grid Computing. In order to understand grid computing vs. cloud computing in a holistic way, we must first take a look at them individually.

What are the benefits of cloud migration? Reasons you should migrate

To move or not to move? Benefits are multifold when you are migrating to the cloud. Get the correct information to make your decision, with our cloud engineering expertise.

Azure Compute: Common resources

Azure compute is an on-demand service for running cloud-based apps. Azure compute provides an on-demand infrastructure to help you run your app.