Juanita  Apio

Juanita Apio

1622613720

Real-Time Search and Analytics with Confluent, Azure, Redis, and Spring Cloud

Self-managing a distributed system like Apache Kafka ®, along with building and operating Kafka connectors, is complex and resource intensive. It requires significant Kafka skills and expertise in the development and operations teams of your organization. Additionally, the higher the volumes of real-time data that you work with, the more challenging it becomes to ensure that all of the infrastructure scales efficiently and runs reliably.

Confluent and Microsoft are working together to make the process of adopting event streaming easier than ever by alleviating the typical infrastructure management needs that often pull developers away from building critical applications. With Azure and Confluent seamlessly integrated, you can collect, store, process event streams in real-time and feed them to multiple Azure data services. The integration helps reduce the burden of managing resources across Azure and Confluent.

The unified integration with Confluent enables you to:

  • Provision a new Confluent Cloud resource from Azure client interfaces like Azure Portal/CLI/SDKs with fully managed infrastructure
  • Streamline single sign-on (SSO) from Azure to Confluent Cloud with your existing Azure Active Directory (AAD) identities
  • Get unified billing of your Confluent Cloud service usage through Azure subscription invoicing with the option to draw down on Azure commits; Confluent Cloud consumption charges simply appear as a line item on monthly Azure bills
  • Manage Confluent Cloud resources from the Azure portal and track them in the “All Resources” page, alongside your Azure resources

Confluent has developed an extensive library of pre-built connectors that seamlessly integrate data from many different environments. With Confluent, Azure customers access fully managed connectors that stream data for low-latency, real-time analytics into Azure and Microsoft services like Azure FunctionsAzure Blob StorageAzure Event HubsAzure Data Lake Storage (ADLS) Gen2, and Microsoft SQL Server. More real-time data can now easily flow to applications for smarter analytics and more context-rich experiences.

Real-time search use case

In today’s rapidly evolving business ecosystem, organizations must create new business models, provide great customer experiences, and improve operational efficiencies to stay relevant and competitive. Technology plays a critical role in this journey with the new imperative being to build scalable, reliable, persistent real-time systems. Real-time infrastructure for processing large volumes of data with lower costs and reduced risk plays a key role in this evolution.

Apache Kafka often plays a key role in the modern data architecture with other systems producing/consuming data to/from it. These could be customer orders, financial transactions, clickstream events, logs, sensor data, and database change events. As you might imagine, there is a lot of data in Kafka (topics), but it’s useful only when processed (e.g., with Azure Spring Cloud or ksqlDB) or when ingested into other systems.

Let’s investigate an architecture pattern that transforms an existing traditional transaction system into a real-time data processing system. We’ll describe a data pipeline that synchronizes data between MySQL and RediSearch, powered by Confluent Cloud on Azure. This scenario is applicable to many use cases, but we’ll specifically cover the scenario where batch data must be available to downstream systems in near real time to fulfill search requirements. The data can be further streamed to an ADLS store for correlation of real-time and historic data, analytics, and visualizations. This provides a foundation for other services through APIs to drive important parts of the business, such as a customer-facing website that can provide fresh, up-to-date information on products, availability, and more.

Below are the key elements and capabilities of the above-mentioned architecture:

  • Infrastructure components that form the bedrock of the architecture
  • ADLS Gen2 sink connector for exporting data from Kafka topics to ADLS
  • ADLS datastore for correlation of real-time and historical data and further analytics and visualizations
  • Application components: These are services running on Azure Spring Cloud
  • Java Spring Boot application that uses the Spring for Apache Kafka integration is a consumer application that processes events from Kafka topics to Redis by creating the required index definition; it adds records as RediSearch documents by creating the required index definition and adding new product information as RediSearch documents (currently represented as a Redis hash)
  • A search application is another Spring Boot application that makes the RediSearch data available as a REST API; it allows you to execute queries per the RediSearch query syntax

The above-mentioned services use the JRediSearch library to interface with RediSearch in order to create indexes, add documents, and query.

Thanks to the JDBC source connector, data in MySQL (the products table) is sent to a Kafka topic. Here is what the JSON payload looks like:

Objectives

The data can be uploaded into a relational database on Azure Database for MySQL, in this case, through an application or a batch process. This data will be synchronised from Confluent Cloud on Azure to the RediSearch module available in the Azure Cache for Redis Enterprise service. This will enable you to perform real-time search with your data in a flexible way. The real-time data is also streamed to an ADLS store. All the service components can be deployed to one Azure region for low latency and performance. Additionally, these service components are deployed in a single Azure subscription to enable unified billing of your Confluent Cloud usage through Azure subscription invoicing.

#kafka #redis #mysql #azure

Real-Time Search and Analytics with Confluent, Azure, Redis, and Spring Cloud