Kafka on Kubernetes, the Strimzi Way! (Part 4)

Welcome to part four of this blog series! So far, we have a Kafka single-node cluster with TLS encryption on top of which we configured different authentication modes (TLS and SASL SCRAM-SHA-512), defined users with the User Operator, connected to the cluster using CLI and Go clients, and saw how easy it is to manage Kafka topics with the Topic Operator. So far, our cluster used ephemeral persistence, which in the case of a single-node cluster, means that we will lose data if the Kafka or Zookeeper nodes (Pods) are restarted due to any reason.

Let’s march on! In this part we will cover:

How to configure Strimzi to add persistence for our cluster.
Explore the components such as PersistentVolume and PersistentVolumeClaim
How to modify the storage quality.
Try and expand the storage size for our Kafka cluster.

What Do I Need to Go Through This Tutorial?

kubectl - https://kubernetes.io/docs/tasks/tools/install-kubectl/

I will be using Azure Kubernetes Service (AKS) to demonstrate the concepts, but by and large it is independent of the Kubernetes provider. If you want to use AKS, all you need is a Microsoft Azure account which you can get for FREE if you don’t have one already.

I will not be repeating some of the common sections (such as Installation/Setup (Helm, Strimzi, Azure Kubernetes Service), Strimzi overview) in this or subsequent part of this series and would request you to refer to part one

Add Persistence

We will start off by creating a persistent cluster. Here is a snippet of the specification (you can access the complete YAML on GitHub)

YAML

apiVersion: kafka.strimzi.io/v1beta1

kind: Kafka

metadata:

name: my-kafka-cluster

spec:

kafka:

    version: 2.4.0

replicas: 1

    storage:

type: persistent-claim

      size: 2Gi

deleteClaim: true

....

zookeeper:

    replicas: 1

storage:

      type: persistent-claim

size: 1Gi

      deleteClaim: true
```

The key things to notice:

*   `storage.type` is `persistent-claim` (as opposed to `ephemeral`) in [previous examples](https://github.com/abhirockzz/kafka-kubernetes-strimzi/blob/master/part-2/kafka.yaml#L20).
*   `storage.size` for Kafka and Zookeeper nodes is `2Gi` and `1Gi` respectively.
*   `deleteClaim: true` means that the corresponding `PersistentVolumeClaim`s will be deleted when the cluster is deleted/un-deployed.

> You can take a look at the reference for `storage` [https://strimzi.io/docs/operators/master/using.html#type-PersistentClaimStorage-reference](https://strimzi.io/docs/operators/master/using.html#type-PersistentClaimStorage-reference)

To create the cluster:

Shell

```
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kafka-kubernetes-strimzi/master/part-4/kafka-persistent.yaml
```

Let's see the what happens in response to the cluster creation

### Strimzi Kubernetes Magic...

[Strimzi](https://strimzi.io/) does all the heavy lifting of creating required Kubernetes resources in order to operate the cluster. We covered most of these in [part 1](https://dev.to/azure/kafka-on-kubernetes-the-strimzi-way-part-1-57g7) - `StatefulSet` (and `Pods`), `LoadBalancer` Service, `ConfigMap`, `Secret` etc. In this blog, we will just focus on the persistence related components - `PersistentVolume` and `PersistentVolumeClaim`

To check the `PersistentVolumeClaim`s

Shell

```
kubectl get pvc

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

data-my-kafka-cluster-kafka-0       Bound    pvc-b4ece32b-a46c-4fbc-9b58-9413eee9c779   2Gi        RWO            default        94s

data-my-kafka-cluster-zookeeper-0 Bound pvc-d705fea9-c443-461c-8d18-acf8e219eab0 1Gi RWO default 3m20s


... and the `PersistentVolume`s they are `Bound` to

Shell

kubectl get pv

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                       STORAGECLASS   REASON   AGE

pvc-b4ece32b-a46c-4fbc-9b58-9413eee9c779 2Gi RWO Delete Bound default/data-my-kafka-cluster-kafka-0 default 107s

pvc-d705fea9-c443-461c-8d18-acf8e219eab0   1Gi        RWO            Delete           Bound    default/data-my-kafka-cluster-zookeeper-0   default                 3m35s
```

> Notice that the disk size is as specified in the manifest ie. `2` and `1` Gib for Kafka and Zookeeper respectively

#### Where Is the Data?

If we want to see the data itself, let's first check the `ConfigMap` which stores the Kafka server config:

Shell

```
export CLUSTER_NAME=my-kafka-cluster

kubectl get configmap/${CLUSTER_NAME}-kafka-config -o yaml


In `server.config` section, you will find an entry as such:

Shell

##########

## Kafka message logs configuration

##########

log.dirs=/var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}
```

This tells us that the Kafka data is stored in `/var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}`. In this case `STRIMZI_BROKER_ID` is `0` since we all we have is a single node

With this info, let's look the the Kafka `Pod`:

Shell

```
export CLUSTER_NAME=my-kafka-cluster

kubectl get pod/${CLUSTER_NAME}-kafka-0 -o yaml


If you look into the `kafka` `container` section, you will notice the following:

One of the `volumes` configuration:

YAML

volumes:

- name: data

persistentVolumeClaim:

    claimName: data-my-kafka-cluster-kafka-0
```

The `volume` named `data` is associated with the `data-my-kafka-cluster-kafka-0` PVC, and the corresponding `volumeMounts` uses this volume to ensure that Kafka data is stored in `/var/lib/kafka/data`

YAML

```
volumeMounts:

mountPath: /var/lib/kafka/data

  name: data
```

To see the contents,

Shell

```
export STRIMZI_BROKER_ID=0

kubectl exec -it my-kafka-cluster-kafka-0 – ls -lrt /var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}

#bigdata #kubernetes #kafka #cncf #azure kubernetes service

What Do I Need to Go Through This Tutorial?

Add Persistence

dzone.com

Kafka on Kubernetes, the Strimzi Way! (Part 4)