During the past few weeks, I was deploying a Flink cluster on a Kubernetes cluster. I’d like to share the challenges, architecture, Kubernetes deployment, solution details, and the journey in this article
At Empathy, all code running in Production must be cloud-agnostic. Empathy had a cloud dependency regarding Dataflow (in the GCP scenario) and Kinesis Data Analytics (in the AWS scenario).
The data streaming job code is developed in Apache Beam; therefore, it could run over Apache Flink. The default way to deploy a job in Apache Flink is to upload a JAR containing the job and its dependencies to a running Flink cluster. This solution can’t be a mid-term solution for reasons such as traceability for JAR files, how to distribute these JAR files, how Continuous Deployment should be done, and localhost execution, to mention a few .
#tutorial #big data #kubernetes #flink #apache flink #cloud agnostic