With the massive adoption of Apache Kafka, enterprises are looking for a way of replicating data across different sites. Kafka by itself has its own internal replication and self-healing mechanism which are only relevant to the local cluster and cannot tolerate a whole site failure. The solution for that, is the “Mirror Maker” feature, with this capability, your local Kafka cluster can be replicated asynchronously to a different external/central Kafka cluster that is located on a whole different location in order to persist your data pipelines, log collection, and metrics gathering processes.

The “Mirror Maker” connects between two clusters, as one of them is the consumer cluster and the other is the producer. Topics are being replicated as a logic entity with all that they have in store into the target cluster where an application can consume the data that is being transferred. The Mirror Maker can be horizontally scalable, which means that it can be scaled out when being the bottleneck.

In this article, we will use the AMQ Streams operator to deploy Kafka on a stretched Openshift cluster (where the nodes are located on different sites), and we’ll mirror all the messages that are being written to the source cluster into the target cluster using the “Mirror Maker” feature. In addition, we’ll use OCS RBD to save the Kafka logDirs, to see that OCS is topology agnostic and can serve nodes from different zones in the same cluster.

In the end, we’ll trace the response time of the whole pipeline using Jaeger, where we could see the response time for each component in the replication pipeline.

Game On!

Prerequisites

  • A stretched Openshift 4.4.6 cluster
  • An OCS 4.4 cluster serving as the storage platform
  • OCS RBD as default storage class

Let’s start by creating a new project for this demo:

$ oc new-project amq-streams

After we have the project set up, let’s install the AMQ operator in the amq-streams project and the Jaeger operator to watch all of the cluster namespaces:

Image for post

Now that we have our operators installed, we can start creating some custom resources that will deploy our environment. First, let’s create our two clusters, where the europe-cluster is the source cluster and the us-cluster is the target one. Each one of the clusters will use OCS RBD to persist it’s written data.

$ oc create -f - <<EOF
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: europe-cluster
spec:
  kafka:
    version: 2.4.0
    replicas: 3
    listeners:
      plain: {}
      tls: {}
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      log.message.format.version: "2.4"
    storage:
      type: persistent-claim
      size: 20Gi
      deleteClaim: true
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      deleteClaim: true
  entityOperator:
    topicOperator: {}
    userOperator: {}
---
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: us-cluster
spec:
  kafka:
    version: 2.4.0
    replicas: 3
    listeners:
      plain: {}
      tls: {}
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      log.message.format.version: "2.4"
    storage:
      type: persistent-claim
      size: 20Gi
      deleteClaim: true
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      deleteClaim: true
  entityOperator:
    topicOperator: {}
    userOperator: {}
EOF

Now let’s verify that our clusters were indeed created and that they have claimed for the wanted storage from our OCS cluster:

$ oc get pods

NAME                                                  READY   STATUS    RESTARTS   AGE
amq-streams-cluster-operator-v1.5.0-f9dc58f75-bqbm8   1/1     Running   0          3m23s
europe-cluster-entity-operator-5b5f7d44f7-57dbj       3/3     Running   0          37s
europe-cluster-kafka-0                                2/2     Running   0          87s
europe-cluster-kafka-1                                2/2     Running   0          87s
europe-cluster-kafka-2                                2/2     Running   0          87s
europe-cluster-zookeeper-0                            1/1     Running   0          2m29s
europe-cluster-zookeeper-1                            1/1     Running   0          2m29s
europe-cluster-zookeeper-2                            1/1     Running   0          2m29s
us-cluster-entity-operator-84fbbf445f-k5kjz           3/3     Running   0          35s
us-cluster-kafka-0                                    2/2     Running   0          95s
us-cluster-kafka-1                                    2/2     Running   0          95s
us-cluster-kafka-2                                    2/2     Running   0          95s
us-cluster-zookeeper-0                                1/1     Running   0          2m29s
us-cluster-zookeeper-1                                1/1     Running   0          2m29s
us-cluster-zookeeper-2                                1/1     Running   0          2m29s

#machine-learning #kafka #big-data #containers #kubernetes #data analysis

Trace you Replicated cross-region data
1.50 GEEK