Apache Spark

Apache Spark is an open source distributed data processing engine written in Scala providing a unified API and distributed data sets to users. Use Cases for Apache Spark often are related to machine/deep learning, graph processing.

apache-spark apachespark apache spark

Mastering Query Plans in Spark 3.0

Spark query plans in a nutshell. We will take a tour of some of the most frequently used operators and explain what information they provide and how it can be interpreted.

Spark Delight — We’re building a better Spark UI

“The Spark UI is my favorite monitoring tool” — said no one ever. The Apache Spark UI, the open source monitoring tool shipped with Apache® Spark is the main interface Spark developers use to understand their application performance.

Performance of Apache Spark on Kubernetes has caught up with YARN

Performance of Apache Spark on Kubernetes has caught up with YARN. Learn our benchmark setup, results, as well as critical tips to make shuffles up to 10x faster when running on Kubernetes!

How to Create a Spark DataFrame the Fast Way

There are several different ways to create a DataFrame in Apache Spark — which one should you use? What is the most efficient way from a performance perspective? In this post, we will look at a few different options using the programming language Apache Spark is written in: Scala.

Spark SQL: Adaptive Query Execution

Altering the physical execution plan at runtime.

Apache Spark — Fast and Furious.

Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute…

Spark Streaming with HTTP REST endpoint serving JSON data

Speed up development and testing of spark structured streaming pipelines using HTTP REST endpoint as streaming source.

Should I repartition?

About Data Distribution in Spark SQL.

Be in charge of Query Execution in Spark SQL

Querying data in Spark became a luxury since Spark 2.x because of SQL and declarative DataFrame API. Using just few lines of high level…

An Oversimplified Introduction to PySpark for Programmers

It is one of the hottest new trends in the technology domain.

Apache Spark performance recipe — Explicitly cache RDD

The word count example below illustrates the importance of caching the RDD when the RDD lineage breaks/branches out.

Spark Streaming with HTTP REST endpoint serving JSON data

Spark Streaming with HTTP REST endpoint serving JSON data - Writing distributed applications could be a time consuming process. While running simple spark.range( 0, 10 ).reduce( _ + _ ) ( A “Hello World” example of Spark ) code on your local machine is easy enough, it eventually gets complicated as you come across more complex real world use cases, especially in Structured Streaming world where you want to do streaming aggregations, join with other streams or with static datasets.

Using the native Apache Spark Operator for Kubernetes

Give your big loads a smooth sailing using the native Apache Spark Operator for Kubernetes

How to Run Spark with Kubernetes

Use Spark in a simple and portable way on-promise and in the cloud. In this blog, I will explain how to run Spark with Kubernetes using the Spark on Kubernetes Operator. I will also describe the configurations for fast S3 data access using S3A Connector and S3A Committers. This architecture works for both cloud object storage and on premise S3 compatible object storage like FlashBlade S3.

Learn Apache Spark - Spark Tutorial for Beginners - Full Course 2020

In this Apache Spark Tutorial for Beginners, you will comprehensively learn all the major concepts of Spark such as Spark RDD, Dataframes, Spark SQL and Spark Streaming. Spark Fundamentals. Spark vs Hadoop. Spark Transformations, Actions and Operations. Spark SQL. Spark Dataframe basics. Spark SQL Hive Integration. Sqoop on Spark

PySpark Tutorial For Beginners | Apache Spark With Python Tutorial

PySpark Tutorial For Beginners | Apache Spark With Python Tutorial will help you understand what PySpark is, the different features of PySpark, and the comparison of Spark with Python and Scala. Learn the various PySpark contents - SparkConf, SparkContext, SparkFiles, RDD, StorageLevel, DataFrames, Broadcast and Accumulator. You will get an idea about the various Subpackages in PySpark. You will look at a demo using PySpark SQL to analyze Walmart Stocks data

Apache Spark Tutorial For Beginners - Apache Spark Full Course

This video on Apache Spark Tutorial For Beginners - Apache Spark Full Course will help you learn the basics of Big Data, what Apache Spark is, and the architecture of Apache Spark. Yyou will understand how to install Apache Spark on Windows and Ubuntu. You will look at the important components of Spark, such as Spark Streaming, Spark MLlib, and Spark SQL. You will get an idea about implement Spark with Python in PySpark tutorial and look at some of the important Apache Spark interview questions

Why learn Apache Spark in 2020?

This video on "Apache Spark in 2020" will provide you with the detailed and comprehensive knowledge about the current IT Job trends based on Apache Spark and why learn Apache Spark in 2020? What is new in Apache Spark? What is Apache Spark? Top 5 Reasons to learn Spark. Salary trends of Spark Developer. Components of Spark. Skills required by Spark Developer. Companies using Apache Spark

A beginner's guide to Spark in Python

In this post we'll discuss how to set up Spark to start easily performing analytics, either simply on your local machine or in a cluster on EC2. We'll start to interact with Spark on the command line and then demo how to write a Spark application in Python and submit it to the cluster as a Spark job

Spark MLlib tutorial | Machine Learning On Spark | Apache Spark Tutorial

This video on Spark MLlib Tutorial will help you learn about Spark's machine learning library. You will understand the different types of machine learning algorithms - supervised, unsupervised, and reinforcement learning.