Azure Cosmos DB Cassandra API is a fully managed cloud service that is compatible with Cassandra Query Language (CQL) v3.11 API. It has no operational overhead, and you can benefit from all the underlying Azure Cosmos DB capabilities such as global distribution, automatic scale-out partitioning, availability and latency guarantees, encryption at rest, backups, etc.
At the time of writing this blog, the Azure Cosmos DB Cassandra API serverless is available in preview mode!
Your existing Cassandra applications can work with the Azure Cosmos DB Cassandra API since it works with CQLv4 compliant drivers (see examples for Java, .Net Core, Node.js, Python, etc.) But, you also need to think about integrating with other systems with existing data and bringing that into Azure Cosmos DB. One such system is Apache Kafka, which is a distributed streaming platform. It is used in industries and organizations to solve a wide variety of problems ranging from traditional asynchronous messaging, website activity tracking, log aggregation, real-time fraud detection, and much more! It has a rich ecosystem of technologies such as Kafka Streams for stream processing and Kafka Connect for real-time data integration.
Thanks to its scalable design, Apache Kafka often serves as a central component in the overall data architecture, with other systems pumping data into it. These could be clickstream events, logs, sensor data, orders, database change-events, etc. You name it! So, as you can imagine, there is a lot of data in Apache Kafka (topics), but it’s only useful when consumed or ingested into other systems. You could achieve this by writing good old plumbing code using the Kafka Producer/Consumer APIs using a language and client SDK of your choice. But you can do better!
This blog post demonstrates how you can use an open-source solution (connector based) to ingest data from Kafka into Azure Cosmos DB Cassandra API. It uses a simple yet practical scenario along with a re-usable setup using Docker Compose to help with iterative development and testing. You will learn about:
By the end of this blog, you should have a working end to end integration and be able to validate it.
The code and configuration associated with this blog post is available in this GitHub
Kafka Connect is a platform to stream data between Apache Kafka and other systems in a scalable and reliable manner. Besides the fact that it only depends on Kafka, the great thing about it is the fact that it provides a suite of ready-to-use connectors. This means that you do not need to write custom integration code to glue systems together; no code, just configuration! In case an existing connector is not available, you can leverage the powerful Kafka Connect framework to build your own connectors.
There are two broad categories of connectors offered by Kafka Connect:
In this blog post, we will be using the open-source DataStax Apache Kafka connector, which is a Sink connector that works on top of the Kafka Connect framework to ingest records from a Kafka topic into rows of one or more Cassandra table(s).
At a high level, the solution is quite simple! But a diagram should be helpful nonetheless.
Sample weather data continuously generated into a Kafka topic. This is picked up by the connector and sent to Azure Cosmos DB and can be queried using any Cassandra client driver.
Except for Azure Cosmos DB, the rest of the components of the solution run as Docker containers (using Docker Compose). This includes Kafka (and Zookeeper), Kafka Connect worker (the Cassandra connector) along with the sample data generator (Go) application. Having said that, the instructions would work with any Kafka cluster and Kafka Connect workers, provided all the components are configured to access and communicate with each other as required. For example, you could have a Kafka cluster on Azure HD Insight or Confluent Cloud on Azure Marketplace.
#cloud #nosql #azure #kafka #databases