Since Confluent Platform version 5.5, Avro is no longer the only schema in town. Protobuf and JSON schemas are now supported as first-class citizens in Confluent universe. But before I go on explaining how to use Protobuf with Kafka, let’s answer one often-asked question:
When applications communicate through a pub-sub system, they exchange messages and those messages need to be understood and agreed upon by all the participants in the communication. Additionally, you would like to detect and prevent changes to the message format that would make messages unreadable for some of the participants.
That’s where a schema comes in — it represents a contract between the participants in communication, just like an API represents a contract between a service and its consumers. And just as REST APIs can be described using OpenAPI (Swagger) so the messages in Kafka can be described using Avro, Protobuf or Avro schemas.
Schemas describe the structure of the data by:
In addition, together with Schema Registry, schemas prevent a producer from sending poison messages - malformed data that consumers cannot interpret. Schema Registry will detect if breaking changes are about to be introduced by the producer and can be configured to reject such changes. An example of a breaking change would be deleting a mandatory field from the schema.
Similar to Apache Avro, Protobuf is a method of serializing structured data. A message format is defined in a .proto file and you can generate code from it in many languages including Java, Python, C++, C#, Go and Ruby. Unlike Avro, Protobuf does not serialize schema with the message. So, in order to deserialize the message, you need the schema in the consumer.
Here’s an example of a Protobuf schema containing one message type:
#integration #apache kafka #schema #apache kafka tutorial #protobuf