We introduce the basic knowledge of Kafka and how to use Docker to build a Kafka Cluster. Apache Kafka is a fast, scalable, high-throughput, fault-tolerant distributed messaging system. Kafka is the central nervous system of LinkedIn, which manages the aggregation of various applications, and the data is processed before being distributed to other places
Apache Kafka is a fast, scalable, high-throughput, fault-tolerant distributed messaging system. A party called Producer sends a stream of messages to Kafka consumed by another party called Consumer. It has the characteristics of high throughput, built-in partitioning, support for data copy and fault tolerance, and is suitable for use in large-scale message processing scenarios.
Here we introduce the basic knowledge of Kafka and how to use Docker to build a cluster.
Kafka is a distributed message publishing and subscription system with high performance and high throughput characteristics and is widely used in big data transmission scenarios. It was developed by LinkedIn, written in Scala language, and later became a top project of the Apache Foundation.
Kafka is the central nervous system of LinkedIn, which manages the aggregation of various applications, and the data is processed before being distributed to other places. Kafka is different from the traditional enterprise information queuing system. It processes all data flowing through a company in a near real-time manner. It currently serves LinkedIn, Netflix, Uber, and Verizon, and has established a real-time information processing platform for this.
Message subscription and sending in Kafka are based on a certain topic. For example, if there is a topic called “La Liga” information, the producer will send the “La Liga” information message to this topic. All consumers subscribing to this topic will pull messages under this topic. The topic is like an inbox for a specific topic, the producer throws it in and the consumer takes it away.
Partition is the physical grouping of topics. A topic can be divided into multiple partitions, and each partition is an ordered queue. Each topic will be further divided into multiple partitions. This means that messages for a single topic will be in multiple partitions. Kafka adopts the partition method to enable consumers to achieve parallel consumption, thereby greatly improving their throughput. At the same time, to achieve high availability, each partition has several replicas, so that if a broker fails, data will not be lost.
A Broker is an instance in the Kafka cluster or a service unit. Multiple broker instances connected to the same zookeeper form a Kafka cluster. Among several brokers, one broker will be the leader, and the rest will be followers. The leader is elected when the cluster starts and is responsible for communication with the outside world. When the leader dies, the followers will pass the election again and choose a new leader to ensure the normal operation of the cluster.
There will be multiple replicas of data in each partition to ensure the high availability of Kafka. There are multiple partitions under the topic, and each partition has its own replica. Only one of them is the leader replica, and the rest are follower replicas.
When the message comes in, it will be stored in the leader replica first, and then copied from the leader replica to the follower replica. The consumer can consume this message only when all the copying is completed. This is to ensure that data can be recovered when an accident occurs. The consumer’s consumption is also read from the leader replica. Therefore, if the leader replicas of different partitions are unevenly distributed on the brokers of the Kafka cluster, the load will be uneven.
Kafka uses Zookeeper to manage the cluster. Zookeeper is used to implements leader election for Kafka broker topic partition. Zookeeper sends topology changes to Kafka, and every node in the cluster knows when to join a new agent, the agent dies, and topics are added or deleted.
Producers are those client applications that publish event messages to Kafka, and consumers are those client applications that subscribe to these event messages.
In Kafka, producers and consumers are completely decoupled and agnostic to each other, which is a key design element for achieving Kafka’s well-known high scalability.
Kafka Consumer subscribes to Kafka topics, and actively pulls messages from Kafka Broker for processing. Each Kafka Consumer maintains the offset of the last read message and requests messages starting from this offset next time. This is different from other message queues such as ZeroMQ and RabbitMQ. This pull-based mechanism greatly reduces the Broker’s pressure, Which makes Kafka Broker’s throughput rate high.
Kafka allows multiple Consumers to form a Consumer Group to read data in the same topic together to improve data reading efficiency. The message consumption of a topic by different Consumers under the same Consumer Group will not be repeated. Kafka can automatically distribute the load for the Consumers in the same Group, to realize the concurrent reading of messages, and when a Consumer fails, it will automatically transfer the partitions processed by it to other Consumers in the same Group for processing.
Each partition consists of a series of ordered and immutable messages, which are successively appended to the partition. Each message in the partition has a continuous sequence number called “offset”, which is used by the partition to uniquely identify a message. The message in each partition starts to record the message from “offset=0”
At some point we've all said the words, "But it works on my machine." It usually happens during testing or when you're trying to get a new project set up. Sometimes it happens when you pull down changes from an updated branch.
The docker manifest command does not work independently to perform any action. In order to work with the docker manifest or manifest list, we use sub-commands along with it. This manifest sub-command can enable us to interact with the image manifests. Furthermore, it also gives information about the OS and the architecture, that a particular image was built for. The image manifest provides a configuration and a set of layers for a container image. This is an experimenta
Following the second video about Docker basics, in this video, I explain Docker architecture and explain the different building blocks of the docker engine; docker client, API, Docker Daemon. I also explain what a docker registry is and I finish the video with a demo explaining and illustrating how to use Docker hub.
DevOps and Cloud computing are joined at the hip, now that fact is well appreciated by the organizations that engaged in SaaS cloud and developed applications in the Cloud. During the COVID crisis period, most of the organizations have started using cloud computing services and implementing a cloud-first strategy to establish their remote operations. Similarly, the extended DevOps strategy will make the development process more agile with automated test cases.
What is DevOps? How are organizations transitioning to DevOps? Is it possible for organizations to shift to enterprise DevOps? Read more to find out!