Welcome to part four of this blog series! So far, we have a Kafka single-node cluster with TLS encryption on top of which we configured different authentication modes (TLS
and SASL SCRAM-SHA-512
), defined users with the User Operator, connected to the cluster using CLI and Go clients, and saw how easy it is to manage Kafka topics with the Topic Operator. So far, our cluster used ephemeral
persistence, which in the case of a single-node cluster, means that we will lose data if the Kafka or Zookeeper nodes (Pod
s) are restarted due to any reason.
Let’s march on! In this part we will cover:
PersistentVolume
and PersistentVolumeClaim
kubectl
- https://kubernetes.io/docs/tasks/tools/install-kubectl/
I will be using Azure Kubernetes Service (AKS) to demonstrate the concepts, but by and large it is independent of the Kubernetes provider. If you want to use AKS
, all you need is a Microsoft Azure account which you can get for FREE if you don’t have one already.
I will not be repeating some of the common sections (such as Installation/Setup (Helm, Strimzi, Azure Kubernetes Service), Strimzi overview) in this or subsequent part of this series and would request you to refer to part one
We will start off by creating a persistent cluster. Here is a snippet of the specification (you can access the complete YAML on GitHub)
YAML
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
name: my-kafka-cluster
spec:
kafka:
version: 2.4.0
replicas: 1
storage:
type: persistent-claim
size: 2Gi
deleteClaim: true
....
zookeeper:
replicas: 1
storage:
type: persistent-claim
size: 1Gi
deleteClaim: true
```
The key things to notice:
* `storage.type` is `persistent-claim` (as opposed to `ephemeral`) in [previous examples](https://github.com/abhirockzz/kafka-kubernetes-strimzi/blob/master/part-2/kafka.yaml#L20).
* `storage.size` for Kafka and Zookeeper nodes is `2Gi` and `1Gi` respectively.
* `deleteClaim: true` means that the corresponding `PersistentVolumeClaim`s will be deleted when the cluster is deleted/un-deployed.
> You can take a look at the reference for `storage` [https://strimzi.io/docs/operators/master/using.html#type-PersistentClaimStorage-reference](https://strimzi.io/docs/operators/master/using.html#type-PersistentClaimStorage-reference)
To create the cluster:
Shell
```
kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kafka-kubernetes-strimzi/master/part-4/kafka-persistent.yaml
```
Let's see the what happens in response to the cluster creation
### Strimzi Kubernetes Magic...
[Strimzi](https://strimzi.io/) does all the heavy lifting of creating required Kubernetes resources in order to operate the cluster. We covered most of these in [part 1](https://dev.to/azure/kafka-on-kubernetes-the-strimzi-way-part-1-57g7) - `StatefulSet` (and `Pods`), `LoadBalancer` Service, `ConfigMap`, `Secret` etc. In this blog, we will just focus on the persistence related components - `PersistentVolume` and `PersistentVolumeClaim`
To check the `PersistentVolumeClaim`s
Shell
```
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-my-kafka-cluster-kafka-0 Bound pvc-b4ece32b-a46c-4fbc-9b58-9413eee9c779 2Gi RWO default 94s
data-my-kafka-cluster-zookeeper-0 Bound pvc-d705fea9-c443-461c-8d18-acf8e219eab0 1Gi RWO default 3m20s
... and the `PersistentVolume`s they are `Bound` to
Shell
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-b4ece32b-a46c-4fbc-9b58-9413eee9c779 2Gi RWO Delete Bound default/data-my-kafka-cluster-kafka-0 default 107s
pvc-d705fea9-c443-461c-8d18-acf8e219eab0 1Gi RWO Delete Bound default/data-my-kafka-cluster-zookeeper-0 default 3m35s
```
> Notice that the disk size is as specified in the manifest ie. `2` and `1` Gib for Kafka and Zookeeper respectively
#### Where Is the Data?
If we want to see the data itself, let's first check the `ConfigMap` which stores the Kafka server config:
Shell
```
export CLUSTER_NAME=my-kafka-cluster
kubectl get configmap/${CLUSTER_NAME}-kafka-config -o yaml
In `server.config` section, you will find an entry as such:
Shell
##########
## Kafka message logs configuration
##########
log.dirs=/var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}
```
This tells us that the Kafka data is stored in `/var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}`. In this case `STRIMZI_BROKER_ID` is `0` since we all we have is a single node
With this info, let's look the the Kafka `Pod`:
Shell
```
export CLUSTER_NAME=my-kafka-cluster
kubectl get pod/${CLUSTER_NAME}-kafka-0 -o yaml
If you look into the `kafka` `container` section, you will notice the following:
One of the `volumes` configuration:
YAML
volumes:
- name: data
persistentVolumeClaim:
claimName: data-my-kafka-cluster-kafka-0
```
The `volume` named `data` is associated with the `data-my-kafka-cluster-kafka-0` PVC, and the corresponding `volumeMounts` uses this volume to ensure that Kafka data is stored in `/var/lib/kafka/data`
YAML
```
volumeMounts:
name: data
```
To see the contents,
Shell
```
export STRIMZI_BROKER_ID=0
kubectl exec -it my-kafka-cluster-kafka-0 – ls -lrt /var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}
#bigdata #kubernetes #kafka #cncf #azure kubernetes service