Spark SQL Tutorial For Beginners | Apache Spark Tutorial For Beginners

Spark SQL is a module for structured data processing. This video on Spark SQL Tutorial will help you understand what Spark SQL is and Spark SQL features. You will learn Spark SQL's architecture and get an idea about dataframe API, data source API, and catalyst optimizer. You will see how to run SQL queries and a demo on Spark SQL.

Throttle Spark-Kafka Streaming Volume

Learn how to avoid streaming bottlenecks in your Apache Spark loads. This article will help any new developer who wants to control the volume of Spark Kafka streaming. Using Kafka data loads as an example, here's how to tweak your settings.

Top 8 Alternatives To Apache Spark

Top 8 Alternatives To Apache Spark: Apache Hadoop; Google BigQuery; Apache Storm; Apache Flink; Lumify; Apache Sqoop; Elasticsearch; Presto. Apache Spark is an open-source unified analytics engine for large-scale data processing. Some of Apache Spark's features include ease of writing applications quickly in various languages, such as Java, Scala, Python, R, and SQL and accessibility in diverse data sources.

Apache Spark Internals: Tips and Optimizations

The goal of this post is to dig a bit deeper into the internals of Apache Spark to get a better understanding of how Spark works under the hood, so we can write optimal code that maximizes parallelism and minimized data shuffles.

How to Build a Real-Time Twitter Analysis using Big Data Tools

Learn how to build a real-time Twitter analysis using Big Data tools and cloud platform: Crack into Apache Spark and AWS ecosystem. Data Science/Machine Learning applications have been everywhere now and radically changing our lives and business.

Functions and OOP in Python | Functions in Python | Spark Training

Functions and OOP in Python will help you to understand the Functions and OOPs required in Spark in depth. It includes an example where we Understand what is Python and Apache Spark.

What is Apache Spark? | Apache Spark Python | Spark Training

This Edureka "What is Apache Spark?" video will help you to understand the Architecture of Spark in depth. It includes an example where we Understand what is Python and Apache Spark.

Continue Big Data With Microsoft Azure | Spark SQL Demo | Azure Databricks Tutorials| Azure Storage

Continue Big Data With Microsoft Azure | Spark SQL Demo | Azure Databricks | Azure Storage. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it's not the amount of data that's important. It's what organizations do with the data that matters.

Using Spark as a Database

Learn how Apache Spark can be leveraged as a database by creating tables in it and querying upon them. Do you know that Spark (with the help of Hive) can also act as a database?

Statistics in Spark SQL Explained

Spark SQL optimizer uses two types of optimizations: rule-based and cost-based. A closer look at the cost-based optimizer in Spark. Most of the optimizations that Spark does are based on some heuristic rules that do not take into account the properties of the data that are being processed.

Using Neo4j with PySpark on Databricks

Neo4j Connector for Apache Spark leveraging the Spark DataSource API. We will first set up a Neo4j cloud instance using an Azure virtual machine. We'll set up an Azure Databricks instance running Spark before finally establishing a connection between both resources using the new Neo4j Connector for Apache Spark. If you already have an up-and-running instance of Neo4j or Databricks, you might of course want to skip the respective steps.

The Missing Piece - Diving into the World of Big Data with .NET for Apache Spark

Data is growing at an unprecedented amount with both human generated and machine generated data. Come, learn about the open-source, .NET for Apache Spark project, the same technology that teams such as Office, Dynamics and Azure use widely to process 100s of Terabytes of data inside Microsoft.

Spark Fundamentals for Python Programmers

Everything you need for your first spark program .Spark Fundamentals for Python Programmers

Connecting the Dots (Python, Spark, and Kafka)

Learn how to connect these dots, which are Python, Apache Spark, and Apache Kafka. Python, Spark, and Kafka are vital frameworks in data scientists’ day to day activities.

8 Non-obvious Features in Spark SQL that Are Worth Knowing

The DataFrame API of Spark SQL is user friendly. 8 non-obvious features in Spark SQL that are worth knowing: What is the difference between array_sort and sort_array? concat function is null-intolerant; collect_list is not a deterministic function; Sorting the window will change the frame; Writing to a table invalidates the cache; Why does calling show() run multiple jobs? How to make sure a User Defined Function is executed only once? UDF can destroy your data distribution

Machine Learning with Apache Spark

Big data is part of our lives now and most companies collecting data have to deal with big data in order to gain meaningful insights from them. While we know complex neural networks work beautifully and accurately when we have a big data set, at times they are not the most ideal. In a situation where the complexity of prediction is high, however, the prediction does need to be fast and efficient. Therefore, we need a scalable machine learning solution. Machine Learning with Apache Spark

Incremental window functions using AWS Glue Bookmarks

This article discusses an efficient approach, using the approach building an AWS Glue predicate pushdown described in my previous article. This approach only reprocesses the data affected by the out-of-order data that has landed.


Rendezvous of Python, SQL, Spark, and Distributed Computing making Machine Learning on Big Data possible. In this article, I will take you through the step-by-step process of using PySpark on a cluster of computers.

SparkSession vs SparkContext vs SQLContext vs HiveContext

What is the difference between SparkSession, SparkContext HiveContext and SQLContext? In this article, I am going to cover the various entry points for Spark Applications and how these have evolved over the releases made.

NYC Home buyers — Here’s your Neighborhood score!

NYC Home buyers — Here’s your Neighborhood score! 3 public datasets were used in this analytics. A) GreatSchools School Ratings. The dataset contains school ratings of all public schools in NYC.