In the writing today I want to measure if there is any performance overhead of running MySQL in Kubernetes, and show what challenges I faced trying to measure it.
I will use a very simple CPU-bound benchmark to measure MySQL performance in OLTP read-only workload:
sysbench oltp_read_only --report-interval=1 --time=1800 --threads=56 --tables=10 --table-size=10000000 --mysql-user=sbtest --mysql-password=sbtest --mysql-socket=/var/lib/mysql/mysql.sock run
The hardware is as follows:
Supermicro server
The most interesting number there is 28 cores / 56 threads. Please keep this in mind; we will need this later.
So let’s see the MySQL performance in the bare metal setup:
[ 607s ] thds: 56 tps: 22154.20 qps: 354451.12 (r/w/o: 310143.73/0.00/44307.39) lat (ms,95%): 2.61 err/s: 0.00 reconn/s: 0.00
[ 608s ] thds: 56 tps: 22247.80 qps: 355955.88 (r/w/o: 311461.27/0.00/44494.61) lat (ms,95%): 2.61 err/s: 0.00 reconn/s: 0.00
[ 609s ] thds: 56 tps: 21984.01 qps: 351641.13 (r/w/o: 307672.12/0.00/43969.02) lat (ms,95%): 2.66 err/s: 0.00 reconn/s: 0.00
So we can get about 22000 qps on this server.
Now, let’s see what we can get if the same server runs a Kubernetes node and we deploy the Percona server image on this node. I will use a modified image of Percona Server 8, which already includes sysbench inside.
You can find my image here.
And I use the following deployment yaml:
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
selector:
app: mysql
ports:
- name: mysql
port: 3306
protocol: TCP
targetPort: 3306
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: mysql
spec:
selector:
matchLabels:
app: mysql
strategy:
type: Recreate
template:
metadata:
labels:
app: mysql
spec:
nodeSelector:
kubernetes.io/hostname: smblade01
volumes:
- name: mysql-persistent-storage
hostPath:
path: /mnt/data/mysql
type: Directory
- name: config-volume
configMap:
name: mysql-config
optional: true
containers:
- image: vadimtk/ps-8-vadim
imagePullPolicy: Always
name: mysql
env:
# Use secret in real usage
- name: MYSQL_ROOT_PASSWORD
value: password
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-persistent-storage
mountPath: /var/lib/mysql
- name: config-volume
mountPath: /etc/my.cnf.d
The most important part here is that we deploy our image on smblade01 node (the same one I ran the bare metal benchmark).
Let’s see what kind of performance we get using this setup. The number I’ve got:
[ 605s ] thds: 56 tps: 10561.88 qps: 169045.04 (r/w/o: 147921.29/0.00/21123.76) lat (ms,95%): 12.98 err/s: 0.00 reconn/s: 0.00
[ 606s ] thds: 56 tps: 10552.00 qps: 168790.98 (r/w/o: 147685.98/0.00/21105.00) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
[ 607s ] thds: 56 tps: 10566.00 qps: 169073.97 (r/w/o: 147942.97/0.00/21131.00) lat (ms,95%): 5.77 err/s: 0.00 reconn/s: 0.00
[ 608s ] thds: 56 tps: 10581.08 qps: 169359.21 (r/w/o: 148195.06/0.00/21164.15) lat (ms,95%): 5.47 err/s: 0.00 reconn/s: 0.00
[ 609s ] thds: 56 tps: 12873.80 qps: 205861.77 (r/w/o: 180116.17/0.00/25745.60) lat (ms,95%): 5.37 err/s: 0.00 reconn/s: 0.00
[ 610s ] thds: 56 tps: 20196.89 qps: 323184.24 (r/w/o: 282789.46/0.00/40394.78) lat (ms,95%): 3.02 err/s: 0.00 reconn/s: 0.00
[ 611s ] thds: 56 tps: 18033.21 qps: 288487.30 (r/w/o: 252421.88/0.00/36065.41) lat (ms,95%): 5.28 err/s: 0.00 reconn/s: 0.00
[ 612s ] thds: 56 tps: 11444.08 qps: 183129.22 (r/w/o: 160241.06/0.00/22888.15) lat (ms,95%): 5.37 err/s: 0.00 reconn/s: 0.00
[ 613s ] thds: 56 tps: 10597.96 qps: 169511.35 (r/w/o: 148316.43/0.00/21194.92) lat (ms,95%): 5.57 err/s: 0.00 reconn/s: 0.00
[ 614s ] thds: 56 tps: 10566.00 qps: 169103.93 (r/w/o: 147969.94/0.00/21133.99) lat (ms,95%): 5.67 err/s: 0.00 reconn/s: 0.00
[ 615s ] thds: 56 tps: 10640.07 qps: 170227.13 (r/w/o: 148948.99/0.00/21278.14) lat (ms,95%): 5.47 err/s: 0.00 reconn/s: 0.00
[ 616s ] thds: 56 tps: 10579.04 qps: 169264.66 (r/w/o: 148106.58/0.00/21158.08) lat (ms,95%): 5.47 err/s: 0.00 reconn/s: 0.00
You can see the numbers vary a lot, from 10550 tps to 20196 tps, with the most time being in the 10000tps range.
That’s quite disappointing. Basically, we lost half of the throughput by moving to the Kubernetes node.
But don’t panic, we can improve this. But first, we need to understand why this happens.
The answer lies in how Kubernetes applies Quality of Service for Pods. By default (if CPU or Memory limits are not defined), the QoS is BestEffort, which leads to the results we see above. To allocate all CPU resources, we need to make sure QoS Guaranteed. For this, we add the following to the image definition:
resources:
requests:
cpu: "55500m"
memory: "150Gi"
limits:
cpu: "55500m"
memory: "150Gi"
These are somewhat funny lines to define CPU limits. As you remember we have 56 threads, so initially I tried to set limits:cpu: "56"
but it did not work as Kubernetes was not able to start the pod with the error Insufficient CPU. I guess Kubernetes allocates a few CPU percentages for the internal needs.
So the line cpu: "55500m"
works, which means we allocate 55.5 CPU for Percona Server.
Let’s see what results we can have with Guaranteed QoS:
[ 883s ] thds: 56 tps: 20320.06 qps: 325145.96 (r/w/o: 284504.84/0.00/40641.12) lat (ms,95%): 2.81 err/s: 0.00 reconn/s: 0.00
[ 884s ] thds: 56 tps: 20908.89 qps: 334587.21 (r/w/o: 292769.43/0.00/41817.78) lat (ms,95%): 2.81 err/s: 0.00 reconn/s: 0.00
[ 885s ] thds: 56 tps: 20529.03 qps: 328459.46 (r/w/o: 287402.40/0.00/41057.06) lat (ms,95%): 2.81 err/s: 0.00 reconn/s: 0.00
[ 886s ] thds: 56 tps: 17567.75 qps: 281051.03 (r/w/o: 245914.53/0.00/35136.50) lat (ms,95%): 5.47 err/s: 0.00 reconn/s: 0.00
[ 887s ] thds: 56 tps: 18036.82 qps: 288509.07 (r/w/o: 252437.44/0.00/36071.63) lat (ms,95%): 5.47 err/s: 0.00 reconn/s: 0.00
[ 888s ] thds: 56 tps: 18398.23 qps: 294399.67 (r/w/o: 257603.21/0.00/36796.46) lat (ms,95%): 5.47 err/s: 0.00 reconn/s: 0.00
[ 889s ] thds: 56 tps: 18402.90 qps: 294484.45 (r/w/o: 257677.65/0.00/36806.81) lat (ms,95%): 5.47 err/s: 0.00 reconn/s: 0.00
[ 890s ] thds: 56 tps: 19428.12 qps: 310787.86 (r/w/o: 271934.63/0.00/38853.23) lat (ms,95%): 5.37 err/s: 0.00 reconn/s: 0.00
[ 891s ] thds: 56 tps: 19848.69 qps: 317646.11 (r/w/o: 277947.73/0.00/39698.39) lat (ms,95%): 5.28 err/s: 0.00 reconn/s: 0.00
[ 892s ] thds: 56 tps: 20457.28 qps: 327333.49 (r/w/o: 286417.93/0.00/40915.56) lat (ms,95%): 2.86 err/s: 0.00 reconn/s: 0.00
This is much better (mostly ranging in 20000 tps), but we still do not get to 22000 tps.
I do not have the full explanation of why there is still a 10% performance loss, but it might be related to this issue. And I see there is a work in progress to improve Guaranteed QoS performance but it was not merged into the mainstream releases yet. Hopefully, it will be in one of the next releases.
#mysql #Kubernetes