The Story of a Migration from EMR to Spark on Kubernetes

In this article, the co-founder of Lingk tells the story of their migration from EMR to the Spark-on-Kubernetes platform managed by Data Mechanics: their goals, the architecture of the solution & challenges they had to address, and the results they obtained.

Goals of this migration

Lingk.io is a data loading, data pipelines, and integration platform built on top of Apache Spark, serving commercial customers, with expertise in the education sector. In a few clicks from their visual interface, their customers can load, deduplicate, and enrich data from dozens of sources.

Under the hood, Lingk used AWS EMR (ElasticMapReduce) to power their product. But they were facing a few issues:

  • EMR required too much infrastructure management for their Devops team with limited Spark experience. Picking the right cluster instance types, memory settings, spark configs, etc.
  • Their total AWS costs were high — they had the intuition that the autoscaling policies of EMR were not very efficient, and that a lot of compute ressources were wasted.
  • Spark apps took 40 seconds to start on average. It’s a long time during which Lingk’s end users had to wait, particularly if they’re building a new data pipeline or integration.
  • The core Spark application was stuck at an earlier version because upgrading Spark to 3.0+ caused unexplained performance regressions.

#spark #apache-spark #data-engineering #kubernetes #emr

What is GEEK

Buddha Community

The Story of a Migration from EMR to Spark on Kubernetes
Christa  Stehr

Christa Stehr

1602964260

50+ Useful Kubernetes Tools for 2020 - Part 2

Introduction

Last year, we provided a list of Kubernetes tools that proved so popular we have decided to curate another list of some useful additions for working with the platform—among which are many tools that we personally use here at Caylent. Check out the original tools list here in case you missed it.

According to a recent survey done by Stackrox, the dominance Kubernetes enjoys in the market continues to be reinforced, with 86% of respondents using it for container orchestration.

(State of Kubernetes and Container Security, 2020)

And as you can see below, more and more companies are jumping into containerization for their apps. If you’re among them, here are some tools to aid you going forward as Kubernetes continues its rapid growth.

(State of Kubernetes and Container Security, 2020)

#blog #tools #amazon elastic kubernetes service #application security #aws kms #botkube #caylent #cli #container monitoring #container orchestration tools #container security #containers #continuous delivery #continuous deployment #continuous integration #contour #developers #development #developments #draft #eksctl #firewall #gcp #github #harbor #helm #helm charts #helm-2to3 #helm-aws-secret-plugin #helm-docs #helm-operator-get-started #helm-secrets #iam #json #k-rail #k3s #k3sup #k8s #keel.sh #keycloak #kiali #kiam #klum #knative #krew #ksniff #kube #kube-prod-runtime #kube-ps1 #kube-scan #kube-state-metrics #kube2iam #kubeapps #kubebuilder #kubeconfig #kubectl #kubectl-aws-secrets #kubefwd #kubernetes #kubernetes command line tool #kubernetes configuration #kubernetes deployment #kubernetes in development #kubernetes in production #kubernetes ingress #kubernetes interfaces #kubernetes monitoring #kubernetes networking #kubernetes observability #kubernetes plugins #kubernetes secrets #kubernetes security #kubernetes security best practices #kubernetes security vendors #kubernetes service discovery #kubernetic #kubesec #kubeterminal #kubeval #kudo #kuma #microsoft azure key vault #mozilla sops #octant #octarine #open source #palo alto kubernetes security #permission-manager #pgp #rafay #rakess #rancher #rook #secrets operations #serverless function #service mesh #shell-operator #snyk #snyk container #sonobuoy #strongdm #tcpdump #tenkai #testing #tigera #tilt #vert.x #wireshark #yaml

The Story of a Migration from EMR to Spark on Kubernetes

In this article, the co-founder of Lingk tells the story of their migration from EMR to the Spark-on-Kubernetes platform managed by Data Mechanics: their goals, the architecture of the solution & challenges they had to address, and the results they obtained.

Goals of this migration

Lingk.io is a data loading, data pipelines, and integration platform built on top of Apache Spark, serving commercial customers, with expertise in the education sector. In a few clicks from their visual interface, their customers can load, deduplicate, and enrich data from dozens of sources.

Under the hood, Lingk used AWS EMR (ElasticMapReduce) to power their product. But they were facing a few issues:

  • EMR required too much infrastructure management for their Devops team with limited Spark experience. Picking the right cluster instance types, memory settings, spark configs, etc.
  • Their total AWS costs were high — they had the intuition that the autoscaling policies of EMR were not very efficient, and that a lot of compute ressources were wasted.
  • Spark apps took 40 seconds to start on average. It’s a long time during which Lingk’s end users had to wait, particularly if they’re building a new data pipeline or integration.
  • The core Spark application was stuck at an earlier version because upgrading Spark to 3.0+ caused unexplained performance regressions.

#spark #apache-spark #data-engineering #kubernetes #emr

Maud  Rosenbaum

Maud Rosenbaum

1601051854

Kubernetes in the Cloud: Strategies for Effective Multi Cloud Implementations

Kubernetes is a highly popular container orchestration platform. Multi cloud is a strategy that leverages cloud resources from multiple vendors. Multi cloud strategies have become popular because they help prevent vendor lock-in and enable you to leverage a wide variety of cloud resources. However, multi cloud ecosystems are notoriously difficult to configure and maintain.

This article explains how you can leverage Kubernetes to reduce multi cloud complexities and improve stability, scalability, and velocity.

Kubernetes: Your Multi Cloud Strategy

Maintaining standardized application deployments becomes more challenging as your number of applications and the technologies they are based on increase. As environments, operating systems, and dependencies differ, management and operations require more effort and extensive documentation.

In the past, teams tried to get around these difficulties by creating isolated projects in the data center. Each project, including its configurations and requirements were managed independently. This required accurately predicting performance and the number of users before deployment and taking down applications to update operating systems or applications. There were many chances for error.

Kubernetes can provide an alternative to the old method, enabling teams to deploy applications independent of the environment in containers. This eliminates the need to create resource partitions and enables teams to operate infrastructure as a unified whole.

In particular, Kubernetes makes it easier to deploy a multi cloud strategy since it enables you to abstract away service differences. With Kubernetes deployments you can work from a consistent platform and optimize services and applications according to your business needs.

The Compelling Attributes of Multi Cloud Kubernetes

Multi cloud Kubernetes can provide multiple benefits beyond a single cloud deployment. Below are some of the most notable advantages.

Stability

In addition to the built-in scalability, fault tolerance, and auto-healing features of Kubernetes, multi cloud deployments can provide service redundancy. For example, you can mirror applications or split microservices across vendors. This reduces the risk of a vendor-related outage and enables you to create failovers.

#kubernetes #multicloud-strategy #kubernetes-cluster #kubernetes-top-story #kubernetes-cluster-install #kubernetes-explained #kubernetes-infrastructure #cloud

Adaline  Kulas

Adaline Kulas

1594166040

What are the benefits of cloud migration? Reasons you should migrate

The moving of applications, databases and other business elements from the local server to the cloud server called cloud migration. This article will deal with migration techniques, requirement and the benefits of cloud migration.

In simple terms, moving from local to the public cloud server is called cloud migration. Gartner says 17.5% revenue growth as promised in cloud migration and also has a forecast for 2022 as shown in the following image.

#cloud computing services #cloud migration #all #cloud #cloud migration strategy #enterprise cloud migration strategy #business benefits of cloud migration #key benefits of cloud migration #benefits of cloud migration #types of cloud migration

How to Configure And Monitor Apache Spark on Kubernetes

Earlier this year at Spark + AI Summit, we had the pleasure of presenting our session on the best practices and pitfalls of running Apache Spark on Kubernetes (K8s).

In this post we’d like to expand on that presentation and talk to you about:

  1. What is Kubernetes?
  2. Why run Spark on Kubernetes?
  3. Getting started with Spark on Kubernetes
  4. Optimizing performance and cost
  5. Monitoring your Spark applications on Kubernetes
  6. The future of Spark on Kubernetes

If you’re already familiar with k8s and why Spark on Kubernetes might be a fit for you, feel free to skip the first couple of sections and get straight to the meat of the post!

#apache-spark #spark-on-kubernetes #docker #kubernetes #spark-on-k8s #k8s #good-company #devops