Cómo Implementar Spark En Un Clúster De Kubernetes

Esta publicación detalla cómo implementar Spark en un clúster de Kubernetes.

Dependencias:

  • Ventana acoplable v20.10.10
  • Minikube v1.24.0
  • Chispa v3.2.0
  • Hadoop v3.3.1

minicubo

Minikube es una herramienta que se utiliza para ejecutar localmente un clúster de Kubernetes de un solo nodo.

Siga la guía oficial de instalación de Minikube para instalarlo junto con un hipervisor (como VirtualBox o HyperKit ), para administrar máquinas virtuales, y Kubectl , para implementar y administrar aplicaciones en Kubernetes.

De manera predeterminada, la máquina virtual Minikube está configurada para usar 1 GB de memoria y 2 núcleos de CPU. Esto no es suficiente para los trabajos de Spark, así que asegúrese de aumentar la memoria en su cliente Docker (para HyperKit) o ​​directamente en VirtualBox. Luego, cuando inicie Minikube, pásele las opciones de memoria y CPU:

$ minikube start --vm-driver=hyperkit --memory 8192 --cpus 4

or

$ minikube start --memory 8192 --cpus 4

Estibador

A continuación, construyamos una imagen de Docker personalizada para Spark 3.2.0 , diseñada para el modo autónomo de Spark .

archivo acoplable :

# base image
FROM openjdk:11

# define spark and hadoop versions
ENV SPARK_VERSION=3.2.0
ENV HADOOP_VERSION=3.3.1

# download and install hadoop
RUN mkdir -p /opt && \
    cd /opt && \
    curl http://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz | \
        tar -zx hadoop-${HADOOP_VERSION}/lib/native && \
    ln -s hadoop-${HADOOP_VERSION} hadoop && \
    echo Hadoop ${HADOOP_VERSION} native libraries installed in /opt/hadoop/lib/native

# download and install spark
RUN mkdir -p /opt && \
    cd /opt && \
    curl http://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz | \
        tar -zx && \
    ln -s spark-${SPARK_VERSION}-bin-hadoop2.7 spark && \
    echo Spark ${SPARK_VERSION} installed in /opt

# add scripts and update spark default config
ADD common.sh spark-master spark-worker /
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
ENV PATH $PATH:/opt/spark/bin

Puede encontrar el Dockerfile anterior junto con el archivo de configuración de Spark y las secuencias de comandos en el repositorio de spark-kubernetes en GitHub.

Construye la imagen:

$ eval $(minikube docker-env)
$ docker build -f docker/Dockerfile -t spark-hadoop:3.2.0 ./docker

Si no quiere dedicar tiempo a crear la imagen localmente, no dude en usar mi imagen Spark preconstruida de Docker Hub : mjhea0/spark-hadoop:3.2.0 .

Vista:

$ docker image ls spark-hadoop

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
spark-hadoop        3.2.0               8f3ccdadd795        11 minutes ago      1.12GB

Maestro de chispas

chispa-maestro-despliegue.yaml :

kind: Deployment
apiVersion: apps/v1
metadata:
  name: spark-master
spec:
  replicas: 1
  selector:
    matchLabels:
      component: spark-master
  template:
    metadata:
      labels:
        component: spark-master
    spec:
      containers:
        - name: spark-master
          image: spark-hadoop:3.2.0
          command: ["/spark-master"]
          ports:
            - containerPort: 7077
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m

chispa-master-servicio.yaml :

kind: Service
apiVersion: v1
metadata:
  name: spark-master
spec:
  ports:
    - name: webui
      port: 8080
      targetPort: 8080
    - name: spark
      port: 7077
      targetPort: 7077
  selector:
    component: spark-master

Cree la implementación maestra de Spark e inicie los servicios:

$ kubectl create -f ./kubernetes/spark-master-deployment.yaml
$ kubectl create -f ./kubernetes/spark-master-service.yaml

Verificar:

$ kubectl get deployments

NAME           READY   UP-TO-DATE   AVAILABLE   AGE
spark-master   1/1     1            1           2m55s


$ kubectl get pods

NAME                          READY   STATUS    RESTARTS   AGE
spark-master-dbc47bc9-tlgfs   1/1     Running   0          3m8s

Trabajadores de chispa

chispa-trabajador-despliegue.yaml :

kind: Deployment
apiVersion: apps/v1
metadata:
  name: spark-worker
spec:
  replicas: 2
  selector:
    matchLabels:
      component: spark-worker
  template:
    metadata:
      labels:
        component: spark-worker
    spec:
      containers:
        - name: spark-worker
          image: spark-hadoop:3.2.0
          command: ["/spark-worker"]
          ports:
            - containerPort: 8081
          resources:
            requests:
              cpu: 100m

Cree la implementación del trabajador de Spark:

$ kubectl create -f ./kubernetes/spark-worker-deployment.yaml

Verificar:

$ kubectl get deployments

NAME           READY   UP-TO-DATE   AVAILABLE   AGE
spark-master   1/1     1            1           6m35s
spark-worker   2/2     2            2           7s


$ kubectl get pods

NAME                            READY   STATUS    RESTARTS   AGE
spark-master-dbc47bc9-tlgfs     1/1     Running   0          6m53s
spark-worker-795dc47587-fjkjt   1/1     Running   0          25s
spark-worker-795dc47587-g9n64   1/1     Running   0          25s

Ingreso

¿Notó que expusimos la interfaz de usuario web de Spark en el puerto 8080? Para acceder a él fuera del clúster, configuremos un objeto Ingress .

minikube-ingreso.yaml :

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minikube-ingress
  annotations:
spec:
  rules:
  - host: spark-kubernetes
    http:
      paths:
        - pathType: Prefix
          path: /
          backend:
            service:
              name: spark-master
              port:
                number: 8080

Habilite el complemento de ingreso :

$ minikube addons enable ingress

Cree el objeto de ingreso:

$ kubectl apply -f ./kubernetes/minikube-ingress.yaml

A continuación, debe actualizar su archivo /etc/hosts para enrutar las solicitudes desde el host que definimos spark-kubernetes, a la instancia de Minikube.

Agregue una entrada a /etc/hosts:

$ echo "$(minikube ip) spark-kubernetes" | sudo tee -a /etc/hosts

Pruébelo en el navegador en http://spark-kubernetes/ :

chispa interfaz de usuario web

Prueba

Para probar, ejecute el shell PySpark desde el contenedor maestro:

$ kubectl get pods -o wide

NAME                            READY   STATUS    RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
spark-master-dbc47bc9-t6v84     1/1     Running   0          7m35s   172.17.0.6   minikube   <none>           <none>
spark-worker-795dc47587-5ch8f   1/1     Running   0          7m24s   172.17.0.9   minikube   <none>           <none>
spark-worker-795dc47587-fvcf6   1/1     Running   0          7m24s   172.17.0.7   minikube   <none>           <none>

$ kubectl exec spark-master-dbc47bc9-t6v84 -it -- \
    pyspark --conf spark.driver.bindAddress=172.17.0.6 --conf spark.driver.host=172.17.0.6

Luego, ejecute el siguiente código después de que aparezca el mensaje de PySpark:

words = 'the quick brown fox jumps over the\
        lazy dog the quick brown fox jumps over the lazy dog'
sc = SparkContext.getOrCreate()
seq = words.split()
data = sc.parallelize(seq)
counts = data.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b).collect()
dict(counts)
sc.stop()

Debería ver:

{'brown': 2, 'lazy': 2, 'over': 2, 'fox': 2, 'dog': 2, 'quick': 2, 'the': 4, 'jumps': 2}

¡Eso es todo!


Puede encontrar los scripts en el repositorio de spark-kubernetes en GitHub. ¡Salud!

Fuente:  https://testdriven.io

#spark #apache-spark #kubernetes 

What is GEEK

Buddha Community

Cómo Implementar Spark En Un Clúster De Kubernetes
Christa  Stehr

Christa Stehr

1602964260

50+ Useful Kubernetes Tools for 2020 - Part 2

Introduction

Last year, we provided a list of Kubernetes tools that proved so popular we have decided to curate another list of some useful additions for working with the platform—among which are many tools that we personally use here at Caylent. Check out the original tools list here in case you missed it.

According to a recent survey done by Stackrox, the dominance Kubernetes enjoys in the market continues to be reinforced, with 86% of respondents using it for container orchestration.

(State of Kubernetes and Container Security, 2020)

And as you can see below, more and more companies are jumping into containerization for their apps. If you’re among them, here are some tools to aid you going forward as Kubernetes continues its rapid growth.

(State of Kubernetes and Container Security, 2020)

#blog #tools #amazon elastic kubernetes service #application security #aws kms #botkube #caylent #cli #container monitoring #container orchestration tools #container security #containers #continuous delivery #continuous deployment #continuous integration #contour #developers #development #developments #draft #eksctl #firewall #gcp #github #harbor #helm #helm charts #helm-2to3 #helm-aws-secret-plugin #helm-docs #helm-operator-get-started #helm-secrets #iam #json #k-rail #k3s #k3sup #k8s #keel.sh #keycloak #kiali #kiam #klum #knative #krew #ksniff #kube #kube-prod-runtime #kube-ps1 #kube-scan #kube-state-metrics #kube2iam #kubeapps #kubebuilder #kubeconfig #kubectl #kubectl-aws-secrets #kubefwd #kubernetes #kubernetes command line tool #kubernetes configuration #kubernetes deployment #kubernetes in development #kubernetes in production #kubernetes ingress #kubernetes interfaces #kubernetes monitoring #kubernetes networking #kubernetes observability #kubernetes plugins #kubernetes secrets #kubernetes security #kubernetes security best practices #kubernetes security vendors #kubernetes service discovery #kubernetic #kubesec #kubeterminal #kubeval #kudo #kuma #microsoft azure key vault #mozilla sops #octant #octarine #open source #palo alto kubernetes security #permission-manager #pgp #rafay #rakess #rancher #rook #secrets operations #serverless function #service mesh #shell-operator #snyk #snyk container #sonobuoy #strongdm #tcpdump #tenkai #testing #tigera #tilt #vert.x #wireshark #yaml

Cómo Implementar Spark En Un Clúster De Kubernetes

Esta publicación detalla cómo implementar Spark en un clúster de Kubernetes.

Dependencias:

  • Ventana acoplable v20.10.10
  • Minikube v1.24.0
  • Chispa v3.2.0
  • Hadoop v3.3.1

minicubo

Minikube es una herramienta que se utiliza para ejecutar localmente un clúster de Kubernetes de un solo nodo.

Siga la guía oficial de instalación de Minikube para instalarlo junto con un hipervisor (como VirtualBox o HyperKit ), para administrar máquinas virtuales, y Kubectl , para implementar y administrar aplicaciones en Kubernetes.

De manera predeterminada, la máquina virtual Minikube está configurada para usar 1 GB de memoria y 2 núcleos de CPU. Esto no es suficiente para los trabajos de Spark, así que asegúrese de aumentar la memoria en su cliente Docker (para HyperKit) o ​​directamente en VirtualBox. Luego, cuando inicie Minikube, pásele las opciones de memoria y CPU:

$ minikube start --vm-driver=hyperkit --memory 8192 --cpus 4

or

$ minikube start --memory 8192 --cpus 4

Estibador

A continuación, construyamos una imagen de Docker personalizada para Spark 3.2.0 , diseñada para el modo autónomo de Spark .

archivo acoplable :

# base image
FROM openjdk:11

# define spark and hadoop versions
ENV SPARK_VERSION=3.2.0
ENV HADOOP_VERSION=3.3.1

# download and install hadoop
RUN mkdir -p /opt && \
    cd /opt && \
    curl http://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz | \
        tar -zx hadoop-${HADOOP_VERSION}/lib/native && \
    ln -s hadoop-${HADOOP_VERSION} hadoop && \
    echo Hadoop ${HADOOP_VERSION} native libraries installed in /opt/hadoop/lib/native

# download and install spark
RUN mkdir -p /opt && \
    cd /opt && \
    curl http://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz | \
        tar -zx && \
    ln -s spark-${SPARK_VERSION}-bin-hadoop2.7 spark && \
    echo Spark ${SPARK_VERSION} installed in /opt

# add scripts and update spark default config
ADD common.sh spark-master spark-worker /
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
ENV PATH $PATH:/opt/spark/bin

Puede encontrar el Dockerfile anterior junto con el archivo de configuración de Spark y las secuencias de comandos en el repositorio de spark-kubernetes en GitHub.

Construye la imagen:

$ eval $(minikube docker-env)
$ docker build -f docker/Dockerfile -t spark-hadoop:3.2.0 ./docker

Si no quiere dedicar tiempo a crear la imagen localmente, no dude en usar mi imagen Spark preconstruida de Docker Hub : mjhea0/spark-hadoop:3.2.0 .

Vista:

$ docker image ls spark-hadoop

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
spark-hadoop        3.2.0               8f3ccdadd795        11 minutes ago      1.12GB

Maestro de chispas

chispa-maestro-despliegue.yaml :

kind: Deployment
apiVersion: apps/v1
metadata:
  name: spark-master
spec:
  replicas: 1
  selector:
    matchLabels:
      component: spark-master
  template:
    metadata:
      labels:
        component: spark-master
    spec:
      containers:
        - name: spark-master
          image: spark-hadoop:3.2.0
          command: ["/spark-master"]
          ports:
            - containerPort: 7077
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m

chispa-master-servicio.yaml :

kind: Service
apiVersion: v1
metadata:
  name: spark-master
spec:
  ports:
    - name: webui
      port: 8080
      targetPort: 8080
    - name: spark
      port: 7077
      targetPort: 7077
  selector:
    component: spark-master

Cree la implementación maestra de Spark e inicie los servicios:

$ kubectl create -f ./kubernetes/spark-master-deployment.yaml
$ kubectl create -f ./kubernetes/spark-master-service.yaml

Verificar:

$ kubectl get deployments

NAME           READY   UP-TO-DATE   AVAILABLE   AGE
spark-master   1/1     1            1           2m55s


$ kubectl get pods

NAME                          READY   STATUS    RESTARTS   AGE
spark-master-dbc47bc9-tlgfs   1/1     Running   0          3m8s

Trabajadores de chispa

chispa-trabajador-despliegue.yaml :

kind: Deployment
apiVersion: apps/v1
metadata:
  name: spark-worker
spec:
  replicas: 2
  selector:
    matchLabels:
      component: spark-worker
  template:
    metadata:
      labels:
        component: spark-worker
    spec:
      containers:
        - name: spark-worker
          image: spark-hadoop:3.2.0
          command: ["/spark-worker"]
          ports:
            - containerPort: 8081
          resources:
            requests:
              cpu: 100m

Cree la implementación del trabajador de Spark:

$ kubectl create -f ./kubernetes/spark-worker-deployment.yaml

Verificar:

$ kubectl get deployments

NAME           READY   UP-TO-DATE   AVAILABLE   AGE
spark-master   1/1     1            1           6m35s
spark-worker   2/2     2            2           7s


$ kubectl get pods

NAME                            READY   STATUS    RESTARTS   AGE
spark-master-dbc47bc9-tlgfs     1/1     Running   0          6m53s
spark-worker-795dc47587-fjkjt   1/1     Running   0          25s
spark-worker-795dc47587-g9n64   1/1     Running   0          25s

Ingreso

¿Notó que expusimos la interfaz de usuario web de Spark en el puerto 8080? Para acceder a él fuera del clúster, configuremos un objeto Ingress .

minikube-ingreso.yaml :

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minikube-ingress
  annotations:
spec:
  rules:
  - host: spark-kubernetes
    http:
      paths:
        - pathType: Prefix
          path: /
          backend:
            service:
              name: spark-master
              port:
                number: 8080

Habilite el complemento de ingreso :

$ minikube addons enable ingress

Cree el objeto de ingreso:

$ kubectl apply -f ./kubernetes/minikube-ingress.yaml

A continuación, debe actualizar su archivo /etc/hosts para enrutar las solicitudes desde el host que definimos spark-kubernetes, a la instancia de Minikube.

Agregue una entrada a /etc/hosts:

$ echo "$(minikube ip) spark-kubernetes" | sudo tee -a /etc/hosts

Pruébelo en el navegador en http://spark-kubernetes/ :

chispa interfaz de usuario web

Prueba

Para probar, ejecute el shell PySpark desde el contenedor maestro:

$ kubectl get pods -o wide

NAME                            READY   STATUS    RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
spark-master-dbc47bc9-t6v84     1/1     Running   0          7m35s   172.17.0.6   minikube   <none>           <none>
spark-worker-795dc47587-5ch8f   1/1     Running   0          7m24s   172.17.0.9   minikube   <none>           <none>
spark-worker-795dc47587-fvcf6   1/1     Running   0          7m24s   172.17.0.7   minikube   <none>           <none>

$ kubectl exec spark-master-dbc47bc9-t6v84 -it -- \
    pyspark --conf spark.driver.bindAddress=172.17.0.6 --conf spark.driver.host=172.17.0.6

Luego, ejecute el siguiente código después de que aparezca el mensaje de PySpark:

words = 'the quick brown fox jumps over the\
        lazy dog the quick brown fox jumps over the lazy dog'
sc = SparkContext.getOrCreate()
seq = words.split()
data = sc.parallelize(seq)
counts = data.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b).collect()
dict(counts)
sc.stop()

Debería ver:

{'brown': 2, 'lazy': 2, 'over': 2, 'fox': 2, 'dog': 2, 'quick': 2, 'the': 4, 'jumps': 2}

¡Eso es todo!


Puede encontrar los scripts en el repositorio de spark-kubernetes en GitHub. ¡Salud!

Fuente:  https://testdriven.io

#spark #apache-spark #kubernetes 

Maud  Rosenbaum

Maud Rosenbaum

1601051854

Kubernetes in the Cloud: Strategies for Effective Multi Cloud Implementations

Kubernetes is a highly popular container orchestration platform. Multi cloud is a strategy that leverages cloud resources from multiple vendors. Multi cloud strategies have become popular because they help prevent vendor lock-in and enable you to leverage a wide variety of cloud resources. However, multi cloud ecosystems are notoriously difficult to configure and maintain.

This article explains how you can leverage Kubernetes to reduce multi cloud complexities and improve stability, scalability, and velocity.

Kubernetes: Your Multi Cloud Strategy

Maintaining standardized application deployments becomes more challenging as your number of applications and the technologies they are based on increase. As environments, operating systems, and dependencies differ, management and operations require more effort and extensive documentation.

In the past, teams tried to get around these difficulties by creating isolated projects in the data center. Each project, including its configurations and requirements were managed independently. This required accurately predicting performance and the number of users before deployment and taking down applications to update operating systems or applications. There were many chances for error.

Kubernetes can provide an alternative to the old method, enabling teams to deploy applications independent of the environment in containers. This eliminates the need to create resource partitions and enables teams to operate infrastructure as a unified whole.

In particular, Kubernetes makes it easier to deploy a multi cloud strategy since it enables you to abstract away service differences. With Kubernetes deployments you can work from a consistent platform and optimize services and applications according to your business needs.

The Compelling Attributes of Multi Cloud Kubernetes

Multi cloud Kubernetes can provide multiple benefits beyond a single cloud deployment. Below are some of the most notable advantages.

Stability

In addition to the built-in scalability, fault tolerance, and auto-healing features of Kubernetes, multi cloud deployments can provide service redundancy. For example, you can mirror applications or split microservices across vendors. This reduces the risk of a vendor-related outage and enables you to create failovers.

#kubernetes #multicloud-strategy #kubernetes-cluster #kubernetes-top-story #kubernetes-cluster-install #kubernetes-explained #kubernetes-infrastructure #cloud

How to Configure And Monitor Apache Spark on Kubernetes

Earlier this year at Spark + AI Summit, we had the pleasure of presenting our session on the best practices and pitfalls of running Apache Spark on Kubernetes (K8s).

In this post we’d like to expand on that presentation and talk to you about:

  1. What is Kubernetes?
  2. Why run Spark on Kubernetes?
  3. Getting started with Spark on Kubernetes
  4. Optimizing performance and cost
  5. Monitoring your Spark applications on Kubernetes
  6. The future of Spark on Kubernetes

If you’re already familiar with k8s and why Spark on Kubernetes might be a fit for you, feel free to skip the first couple of sections and get straight to the meat of the post!

#apache-spark #spark-on-kubernetes #docker #kubernetes #spark-on-k8s #k8s #good-company #devops

Top Spark Development Companies | Best Spark Developers - TopDevelopers.co

An extensively researched list of top Apache spark developers with ratings & reviews to help find the best spark development Companies around the world.

Our thorough research on the ace qualities of the best Big Data Spark consulting and development service providers bring this list of companies. To predict and analyze businesses and in the scenarios where prompt and fast data processing is required, Spark application will greatly be effective for various industry-specific management needs. The companies listed here have been skillfully boosting businesses through effective Spark consulting and customized Big Data solutions.

Check out this list of Best Spark Development Companies with Best Spark Developers.

#spark development service providers #top spark development companies #best big data spark development #spark consulting #spark developers #spark application