Change Data Capture Architecture Using Debezium, Postgres, and Kafka
was a tutorial on how to use Debezium for change data capture from Azure PostgreSQL and send them to Azure Event Hubs for Kafka - it used the wal2json
output plugin.
pgoutput
Plugin?This blog will provide a quick walk through of how to pgoutput
plugin. I will not be repeating a lot of details and use containerized versions (using Docker Compose) of Kafka connect, Kafka (and Zookeeper) to keep things simple. So, the only thing you need is Azure PostgreSQL, which you can setup using a variety of options including, the Azure Portal, Azure CLI, Azure PowerShell, ARM template.
The resources are available on GitHub - https://github.com/abhirockzz/debezium-postgres-pgoutput
publication.autocreate.mode
With the pgoutput
plugin, it’s important that you use the appropriate value for publication.autocreate.mode. If you’re using all_tables (which is the default), you need to ensure that the publication is created up-front for the specific table(s) you want to configure for change data capture. If the publication is not found, the connector will try to create one using _CREATE PUBLICATION <publication_name> FOR ALL TABLES;_
which will fail due to lack of permissions.
The other two options work as expected:
This has been highlighted in the docs https://debezium.io/documentation/reference/1.3/connectors/postgresql.html#postgresql-on-azure
Before that:
Java
1
git clone https://github.com/abhirockzz/debezium-postgres-pgoutput && cd debezium-postgres-pgoutput
Start Kafka, Zookeeper and Kafka Connect containers:
Java
1
export DEBEZIUM_VERSION=1.2
2
docker-compose up
It might take a while to pull the containers for the first time
Once all the containers are up and running, connect to Azure PostgreSQL, create a table and insert some data:
Java
1
psql -h <DBNAME>.postgres.database.azure.com -p 5432 -U <DBUSER>@<DBNAME> -W -d postgres --set=sslmode=require
2
3
psql -h abhishgu-pg.postgres.database.azure.com -p 5432 -U abhishgu@abhishgu-pg -W -d postgres --set=sslmode=require
4
5
CREATE TABLE inventory (id SERIAL, item VARCHAR(30), qty INT, PRIMARY KEY(id));
When publication.autocreate.mode is set to filtered
This works well with Azure PostgreSQL - it does not require super user permissions because the connector creates the publication for a specific table(s) based on the filter/*list values
Update the connector config file (pg-source-connector.json
) with details of your Azure PostgreSQL instance and then create the connector
To create the connector:
Java
1
curl -X POST -H "Content-Type: application/json" --data @pg-source-connector.json http://localhost:8083/connectors
Notice the logs (in the docker compose terminal):
Java
1
Creating new publication 'mytestpub' for plugin 'PGOUTPUT' [io.debezium.connector.postgresql.connection.PostgresReplicationConnection]
Once the connector starts, check the publications in PostgreSQL:
Java
1
pubname | schemaname | tablename
2
-----------+------------+-----------
3
mytestpub | public | inventory
Does it work?
Insert a couple of records in the inventory
table
Java
1
psql -h <DBNAME>.postgres.database.azure.com -p 5432 -U <DBUSER>@<DBNAME> -W -d postgres --set=sslmode=require
2
3
INSERT INTO inventory (item, qty) VALUES ('apples', '100');
4
INSERT INTO inventory (item, qty) VALUES ('oranges', '42');
5
6
select * from inventory;
The connector should push the change events from PostgreSQL WAL (write ahead log) to Kafka. Check the messages in the corresponding Kafka topic:
Java
1
//exec into the kafka docker container
2
docker exec -it debezium-postgres-pgoutput_kafka_1 bash
3
4
cd bin && ./kafka-console-consumer.sh --topic myserver.public.inventory --bootstrap-server kafka:9092 --from-beginning
You should see a couple of change log event payloads (corresponding to the two INSERT
s)
yes they are verbose since the schema is included in the payload
#tutorial #sql #azure #postgresql #kafka #databases #debezium