During the past few weeks, I was deploying a Flink cluster on a Kubernetes cluster. I’d like to share the challenges, architecture, Kubernetes deployment, solution details, and the journey in this article

Challenges

At Empathy, all code running in Production must be cloud-agnostic. Empathy had a cloud dependency regarding Dataflow (in the GCP scenario) and Kinesis Data Analytics (in the AWS scenario).

The data streaming job code is developed in Apache Beam; therefore, it could run over Apache Flink. The default way to deploy a job in Apache Flink is to upload a JAR containing the job and its dependencies to a running Flink cluster. This solution can’t be a mid-term solution for reasons such as traceability for JAR files, how to distribute these JAR files, how Continuous Deployment should be done, and localhost execution, to mention a few .

  • Encapsulating everything as a Docker image allows Empathy to achieve better traceability of the Apache Flink jobs, to distribute the Apache Flink jobs as other normal Docker images, and use the same continuous deployment model as the rest of the applications.
  • Besides, Kubernetes has been adopted for a bunch of applications as the orchestration solution at Empathy. So Data Streaming Stack should be just one more app in this orchestration solution, enjoying the same benefits as the rest of the apps living on a K8s cluster and remaining portable from localhost to production.

#tutorial #big data #kubernetes #flink #apache flink #cloud agnostic

Running Apache Flink on Kubernetes
1.65 GEEK