1660343940
Esta publicación detalla cómo implementar Spark en un clúster de Kubernetes.
Dependencias:
Minikube es una herramienta que se utiliza para ejecutar localmente un clúster de Kubernetes de un solo nodo.
Siga la guía oficial de instalación de Minikube para instalarlo junto con un hipervisor (como VirtualBox o HyperKit ), para administrar máquinas virtuales, y Kubectl , para implementar y administrar aplicaciones en Kubernetes.
De manera predeterminada, la máquina virtual Minikube está configurada para usar 1 GB de memoria y 2 núcleos de CPU. Esto no es suficiente para los trabajos de Spark, así que asegúrese de aumentar la memoria en su cliente Docker (para HyperKit) o directamente en VirtualBox. Luego, cuando inicie Minikube, pásele las opciones de memoria y CPU:
$ minikube start --vm-driver=hyperkit --memory 8192 --cpus 4
or
$ minikube start --memory 8192 --cpus 4
A continuación, construyamos una imagen de Docker personalizada para Spark 3.2.0 , diseñada para el modo autónomo de Spark .
archivo acoplable :
# base image
FROM openjdk:11
# define spark and hadoop versions
ENV SPARK_VERSION=3.2.0
ENV HADOOP_VERSION=3.3.1
# download and install hadoop
RUN mkdir -p /opt && \
cd /opt && \
curl http://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz | \
tar -zx hadoop-${HADOOP_VERSION}/lib/native && \
ln -s hadoop-${HADOOP_VERSION} hadoop && \
echo Hadoop ${HADOOP_VERSION} native libraries installed in /opt/hadoop/lib/native
# download and install spark
RUN mkdir -p /opt && \
cd /opt && \
curl http://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz | \
tar -zx && \
ln -s spark-${SPARK_VERSION}-bin-hadoop2.7 spark && \
echo Spark ${SPARK_VERSION} installed in /opt
# add scripts and update spark default config
ADD common.sh spark-master spark-worker /
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
ENV PATH $PATH:/opt/spark/bin
Puede encontrar el Dockerfile anterior junto con el archivo de configuración de Spark y las secuencias de comandos en el repositorio de spark-kubernetes en GitHub.
Construye la imagen:
$ eval $(minikube docker-env)
$ docker build -f docker/Dockerfile -t spark-hadoop:3.2.0 ./docker
Si no quiere dedicar tiempo a crear la imagen localmente, no dude en usar mi imagen Spark preconstruida de Docker Hub : mjhea0/spark-hadoop:3.2.0 .
Vista:
$ docker image ls spark-hadoop
REPOSITORY TAG IMAGE ID CREATED SIZE
spark-hadoop 3.2.0 8f3ccdadd795 11 minutes ago 1.12GB
chispa-maestro-despliegue.yaml :
kind: Deployment
apiVersion: apps/v1
metadata:
name: spark-master
spec:
replicas: 1
selector:
matchLabels:
component: spark-master
template:
metadata:
labels:
component: spark-master
spec:
containers:
- name: spark-master
image: spark-hadoop:3.2.0
command: ["/spark-master"]
ports:
- containerPort: 7077
- containerPort: 8080
resources:
requests:
cpu: 100m
chispa-master-servicio.yaml :
kind: Service
apiVersion: v1
metadata:
name: spark-master
spec:
ports:
- name: webui
port: 8080
targetPort: 8080
- name: spark
port: 7077
targetPort: 7077
selector:
component: spark-master
Cree la implementación maestra de Spark e inicie los servicios:
$ kubectl create -f ./kubernetes/spark-master-deployment.yaml
$ kubectl create -f ./kubernetes/spark-master-service.yaml
Verificar:
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
spark-master 1/1 1 1 2m55s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-master-dbc47bc9-tlgfs 1/1 Running 0 3m8s
chispa-trabajador-despliegue.yaml :
kind: Deployment
apiVersion: apps/v1
metadata:
name: spark-worker
spec:
replicas: 2
selector:
matchLabels:
component: spark-worker
template:
metadata:
labels:
component: spark-worker
spec:
containers:
- name: spark-worker
image: spark-hadoop:3.2.0
command: ["/spark-worker"]
ports:
- containerPort: 8081
resources:
requests:
cpu: 100m
Cree la implementación del trabajador de Spark:
$ kubectl create -f ./kubernetes/spark-worker-deployment.yaml
Verificar:
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
spark-master 1/1 1 1 6m35s
spark-worker 2/2 2 2 7s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-master-dbc47bc9-tlgfs 1/1 Running 0 6m53s
spark-worker-795dc47587-fjkjt 1/1 Running 0 25s
spark-worker-795dc47587-g9n64 1/1 Running 0 25s
¿Notó que expusimos la interfaz de usuario web de Spark en el puerto 8080? Para acceder a él fuera del clúster, configuremos un objeto Ingress .
minikube-ingreso.yaml :
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minikube-ingress
annotations:
spec:
rules:
- host: spark-kubernetes
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: spark-master
port:
number: 8080
Habilite el complemento de ingreso :
$ minikube addons enable ingress
Cree el objeto de ingreso:
$ kubectl apply -f ./kubernetes/minikube-ingress.yaml
A continuación, debe actualizar su archivo /etc/hosts para enrutar las solicitudes desde el host que definimos spark-kubernetes
, a la instancia de Minikube.
Agregue una entrada a /etc/hosts:
$ echo "$(minikube ip) spark-kubernetes" | sudo tee -a /etc/hosts
Pruébelo en el navegador en http://spark-kubernetes/ :
Para probar, ejecute el shell PySpark desde el contenedor maestro:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
spark-master-dbc47bc9-t6v84 1/1 Running 0 7m35s 172.17.0.6 minikube <none> <none>
spark-worker-795dc47587-5ch8f 1/1 Running 0 7m24s 172.17.0.9 minikube <none> <none>
spark-worker-795dc47587-fvcf6 1/1 Running 0 7m24s 172.17.0.7 minikube <none> <none>
$ kubectl exec spark-master-dbc47bc9-t6v84 -it -- \
pyspark --conf spark.driver.bindAddress=172.17.0.6 --conf spark.driver.host=172.17.0.6
Luego, ejecute el siguiente código después de que aparezca el mensaje de PySpark:
words = 'the quick brown fox jumps over the\
lazy dog the quick brown fox jumps over the lazy dog'
sc = SparkContext.getOrCreate()
seq = words.split()
data = sc.parallelize(seq)
counts = data.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b).collect()
dict(counts)
sc.stop()
Debería ver:
{'brown': 2, 'lazy': 2, 'over': 2, 'fox': 2, 'dog': 2, 'quick': 2, 'the': 4, 'jumps': 2}
¡Eso es todo!
Puede encontrar los scripts en el repositorio de spark-kubernetes en GitHub. ¡Salud!
Fuente: https://testdriven.io
#spark #apache-spark #kubernetes
1602964260
Last year, we provided a list of Kubernetes tools that proved so popular we have decided to curate another list of some useful additions for working with the platform—among which are many tools that we personally use here at Caylent. Check out the original tools list here in case you missed it.
According to a recent survey done by Stackrox, the dominance Kubernetes enjoys in the market continues to be reinforced, with 86% of respondents using it for container orchestration.
(State of Kubernetes and Container Security, 2020)
And as you can see below, more and more companies are jumping into containerization for their apps. If you’re among them, here are some tools to aid you going forward as Kubernetes continues its rapid growth.
(State of Kubernetes and Container Security, 2020)
#blog #tools #amazon elastic kubernetes service #application security #aws kms #botkube #caylent #cli #container monitoring #container orchestration tools #container security #containers #continuous delivery #continuous deployment #continuous integration #contour #developers #development #developments #draft #eksctl #firewall #gcp #github #harbor #helm #helm charts #helm-2to3 #helm-aws-secret-plugin #helm-docs #helm-operator-get-started #helm-secrets #iam #json #k-rail #k3s #k3sup #k8s #keel.sh #keycloak #kiali #kiam #klum #knative #krew #ksniff #kube #kube-prod-runtime #kube-ps1 #kube-scan #kube-state-metrics #kube2iam #kubeapps #kubebuilder #kubeconfig #kubectl #kubectl-aws-secrets #kubefwd #kubernetes #kubernetes command line tool #kubernetes configuration #kubernetes deployment #kubernetes in development #kubernetes in production #kubernetes ingress #kubernetes interfaces #kubernetes monitoring #kubernetes networking #kubernetes observability #kubernetes plugins #kubernetes secrets #kubernetes security #kubernetes security best practices #kubernetes security vendors #kubernetes service discovery #kubernetic #kubesec #kubeterminal #kubeval #kudo #kuma #microsoft azure key vault #mozilla sops #octant #octarine #open source #palo alto kubernetes security #permission-manager #pgp #rafay #rakess #rancher #rook #secrets operations #serverless function #service mesh #shell-operator #snyk #snyk container #sonobuoy #strongdm #tcpdump #tenkai #testing #tigera #tilt #vert.x #wireshark #yaml
1660343940
Esta publicación detalla cómo implementar Spark en un clúster de Kubernetes.
Dependencias:
Minikube es una herramienta que se utiliza para ejecutar localmente un clúster de Kubernetes de un solo nodo.
Siga la guía oficial de instalación de Minikube para instalarlo junto con un hipervisor (como VirtualBox o HyperKit ), para administrar máquinas virtuales, y Kubectl , para implementar y administrar aplicaciones en Kubernetes.
De manera predeterminada, la máquina virtual Minikube está configurada para usar 1 GB de memoria y 2 núcleos de CPU. Esto no es suficiente para los trabajos de Spark, así que asegúrese de aumentar la memoria en su cliente Docker (para HyperKit) o directamente en VirtualBox. Luego, cuando inicie Minikube, pásele las opciones de memoria y CPU:
$ minikube start --vm-driver=hyperkit --memory 8192 --cpus 4
or
$ minikube start --memory 8192 --cpus 4
A continuación, construyamos una imagen de Docker personalizada para Spark 3.2.0 , diseñada para el modo autónomo de Spark .
archivo acoplable :
# base image
FROM openjdk:11
# define spark and hadoop versions
ENV SPARK_VERSION=3.2.0
ENV HADOOP_VERSION=3.3.1
# download and install hadoop
RUN mkdir -p /opt && \
cd /opt && \
curl http://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz | \
tar -zx hadoop-${HADOOP_VERSION}/lib/native && \
ln -s hadoop-${HADOOP_VERSION} hadoop && \
echo Hadoop ${HADOOP_VERSION} native libraries installed in /opt/hadoop/lib/native
# download and install spark
RUN mkdir -p /opt && \
cd /opt && \
curl http://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop2.7.tgz | \
tar -zx && \
ln -s spark-${SPARK_VERSION}-bin-hadoop2.7 spark && \
echo Spark ${SPARK_VERSION} installed in /opt
# add scripts and update spark default config
ADD common.sh spark-master spark-worker /
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
ENV PATH $PATH:/opt/spark/bin
Puede encontrar el Dockerfile anterior junto con el archivo de configuración de Spark y las secuencias de comandos en el repositorio de spark-kubernetes en GitHub.
Construye la imagen:
$ eval $(minikube docker-env)
$ docker build -f docker/Dockerfile -t spark-hadoop:3.2.0 ./docker
Si no quiere dedicar tiempo a crear la imagen localmente, no dude en usar mi imagen Spark preconstruida de Docker Hub : mjhea0/spark-hadoop:3.2.0 .
Vista:
$ docker image ls spark-hadoop
REPOSITORY TAG IMAGE ID CREATED SIZE
spark-hadoop 3.2.0 8f3ccdadd795 11 minutes ago 1.12GB
chispa-maestro-despliegue.yaml :
kind: Deployment
apiVersion: apps/v1
metadata:
name: spark-master
spec:
replicas: 1
selector:
matchLabels:
component: spark-master
template:
metadata:
labels:
component: spark-master
spec:
containers:
- name: spark-master
image: spark-hadoop:3.2.0
command: ["/spark-master"]
ports:
- containerPort: 7077
- containerPort: 8080
resources:
requests:
cpu: 100m
chispa-master-servicio.yaml :
kind: Service
apiVersion: v1
metadata:
name: spark-master
spec:
ports:
- name: webui
port: 8080
targetPort: 8080
- name: spark
port: 7077
targetPort: 7077
selector:
component: spark-master
Cree la implementación maestra de Spark e inicie los servicios:
$ kubectl create -f ./kubernetes/spark-master-deployment.yaml
$ kubectl create -f ./kubernetes/spark-master-service.yaml
Verificar:
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
spark-master 1/1 1 1 2m55s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-master-dbc47bc9-tlgfs 1/1 Running 0 3m8s
chispa-trabajador-despliegue.yaml :
kind: Deployment
apiVersion: apps/v1
metadata:
name: spark-worker
spec:
replicas: 2
selector:
matchLabels:
component: spark-worker
template:
metadata:
labels:
component: spark-worker
spec:
containers:
- name: spark-worker
image: spark-hadoop:3.2.0
command: ["/spark-worker"]
ports:
- containerPort: 8081
resources:
requests:
cpu: 100m
Cree la implementación del trabajador de Spark:
$ kubectl create -f ./kubernetes/spark-worker-deployment.yaml
Verificar:
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
spark-master 1/1 1 1 6m35s
spark-worker 2/2 2 2 7s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-master-dbc47bc9-tlgfs 1/1 Running 0 6m53s
spark-worker-795dc47587-fjkjt 1/1 Running 0 25s
spark-worker-795dc47587-g9n64 1/1 Running 0 25s
¿Notó que expusimos la interfaz de usuario web de Spark en el puerto 8080? Para acceder a él fuera del clúster, configuremos un objeto Ingress .
minikube-ingreso.yaml :
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minikube-ingress
annotations:
spec:
rules:
- host: spark-kubernetes
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: spark-master
port:
number: 8080
Habilite el complemento de ingreso :
$ minikube addons enable ingress
Cree el objeto de ingreso:
$ kubectl apply -f ./kubernetes/minikube-ingress.yaml
A continuación, debe actualizar su archivo /etc/hosts para enrutar las solicitudes desde el host que definimos spark-kubernetes
, a la instancia de Minikube.
Agregue una entrada a /etc/hosts:
$ echo "$(minikube ip) spark-kubernetes" | sudo tee -a /etc/hosts
Pruébelo en el navegador en http://spark-kubernetes/ :
Para probar, ejecute el shell PySpark desde el contenedor maestro:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
spark-master-dbc47bc9-t6v84 1/1 Running 0 7m35s 172.17.0.6 minikube <none> <none>
spark-worker-795dc47587-5ch8f 1/1 Running 0 7m24s 172.17.0.9 minikube <none> <none>
spark-worker-795dc47587-fvcf6 1/1 Running 0 7m24s 172.17.0.7 minikube <none> <none>
$ kubectl exec spark-master-dbc47bc9-t6v84 -it -- \
pyspark --conf spark.driver.bindAddress=172.17.0.6 --conf spark.driver.host=172.17.0.6
Luego, ejecute el siguiente código después de que aparezca el mensaje de PySpark:
words = 'the quick brown fox jumps over the\
lazy dog the quick brown fox jumps over the lazy dog'
sc = SparkContext.getOrCreate()
seq = words.split()
data = sc.parallelize(seq)
counts = data.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b).collect()
dict(counts)
sc.stop()
Debería ver:
{'brown': 2, 'lazy': 2, 'over': 2, 'fox': 2, 'dog': 2, 'quick': 2, 'the': 4, 'jumps': 2}
¡Eso es todo!
Puede encontrar los scripts en el repositorio de spark-kubernetes en GitHub. ¡Salud!
Fuente: https://testdriven.io
1601051854
Kubernetes is a highly popular container orchestration platform. Multi cloud is a strategy that leverages cloud resources from multiple vendors. Multi cloud strategies have become popular because they help prevent vendor lock-in and enable you to leverage a wide variety of cloud resources. However, multi cloud ecosystems are notoriously difficult to configure and maintain.
This article explains how you can leverage Kubernetes to reduce multi cloud complexities and improve stability, scalability, and velocity.
Maintaining standardized application deployments becomes more challenging as your number of applications and the technologies they are based on increase. As environments, operating systems, and dependencies differ, management and operations require more effort and extensive documentation.
In the past, teams tried to get around these difficulties by creating isolated projects in the data center. Each project, including its configurations and requirements were managed independently. This required accurately predicting performance and the number of users before deployment and taking down applications to update operating systems or applications. There were many chances for error.
Kubernetes can provide an alternative to the old method, enabling teams to deploy applications independent of the environment in containers. This eliminates the need to create resource partitions and enables teams to operate infrastructure as a unified whole.
In particular, Kubernetes makes it easier to deploy a multi cloud strategy since it enables you to abstract away service differences. With Kubernetes deployments you can work from a consistent platform and optimize services and applications according to your business needs.
The Compelling Attributes of Multi Cloud Kubernetes
Multi cloud Kubernetes can provide multiple benefits beyond a single cloud deployment. Below are some of the most notable advantages.
Stability
In addition to the built-in scalability, fault tolerance, and auto-healing features of Kubernetes, multi cloud deployments can provide service redundancy. For example, you can mirror applications or split microservices across vendors. This reduces the risk of a vendor-related outage and enables you to create failovers.
#kubernetes #multicloud-strategy #kubernetes-cluster #kubernetes-top-story #kubernetes-cluster-install #kubernetes-explained #kubernetes-infrastructure #cloud
1607344140
Earlier this year at Spark + AI Summit, we had the pleasure of presenting our session on the best practices and pitfalls of running Apache Spark on Kubernetes (K8s).
In this post we’d like to expand on that presentation and talk to you about:
If you’re already familiar with k8s and why Spark on Kubernetes might be a fit for you, feel free to skip the first couple of sections and get straight to the meat of the post!
#apache-spark #spark-on-kubernetes #docker #kubernetes #spark-on-k8s #k8s #good-company #devops
1606890999
An extensively researched list of top Apache spark developers with ratings & reviews to help find the best spark development Companies around the world.
Our thorough research on the ace qualities of the best Big Data Spark consulting and development service providers bring this list of companies. To predict and analyze businesses and in the scenarios where prompt and fast data processing is required, Spark application will greatly be effective for various industry-specific management needs. The companies listed here have been skillfully boosting businesses through effective Spark consulting and customized Big Data solutions.
Check out this list of Best Spark Development Companies with Best Spark Developers.
#spark development service providers #top spark development companies #best big data spark development #spark consulting #spark developers #spark application