Data liberation tries to answer; “How do you get data out of your existing systems and use it in an event-driven way?”. Most enterprises have multiple applications that were not designed with event-driven architectures (EDA) in mind. Nevertheless, many of these companies are embracing an event-driven architecture to provide more real-time customer experiences and need to incorporate data from their existing business-critical systems.

There are different approaches to get this data out of the system in an event-driven way: scheduled querying the legacy databases, setting up Change Data Capture (CDC) mechanism on databases, refactoring existing systems to publish events from the application layer, etc. In all of these cases, the liberated events need to be made available on an event broker so other services can be triggered by them.

Image for post

The upcoming book ‘Building Event-driven Microservices’ by Adam Bellemare provides some interesting insights and guidance on these principles.

Debezium engine


this post I’ll look at one way to implement this data liberation pattern: Implementing CDC using Debezium. It’s particularly interesting for systems in maintenance mode where refactoring is not desirable or simply not possible.

Debezium is an open-source CDC platform for common databases like MySQL, Postgress, SQL Server, DB2, … and is part of the Red Hat Integration suite. It converts the changes in your database to an event stream. Initially Debezium was built specifically for the Kafka platform. This by providing a Kafka connector that monitors specific databases.

The community is doing big efforts to make it a more universal tool for data liberation. For this they provide the stand-alone Debezium engine. Here you can run it in a self-managed application, and you can code the event handler yourself. This allows you to publish the CDC events to alternative broker solutions. They already provide a number of examples with brokers like AWS Kinesis, Apache Pulsar, and NATS. See the examples provided by the Debezium community.

Another important effort is the support for multiple output formats. From version 1.2.x (currently in beta), the embedded engine support additional formats besides the Kafka format: JSON, AVRO, and … drum roll … CloudEvents!!. This makes it even easier to integrate into event-driven architectures. More info on the formats.

Why is the addition of CloudEvents so exciting? Check out this talk of Doug Davis at the first online AsyncAPI Conference. However, a side note is in place here. The (Cloud)events emitted by the Debezium engine should (at least in most cases) still be transformed to hide database internal semantics and only emit the fields relevant to the outer context.

#change-data-capture #cloud-native #debezium #data analysis

Data liberation pattern using Debezium engine
1.55 GEEK