Data Processing with Apache Spark

Learn how to use Apache Spark with SQL Server to process data efficiently from several different types of datafiles. Learn how you can process data with Apache Spark and what better way to establish the capabilities of Spark than to put it through its paces and use the Hadoop-DS benchmark to compare performance, throughput, and SQL compatibility against SQL Server.

Spark has emerged as a favorite for analytics, especially those instances that can handle massive volumes of data as well as provide high performance compared to any other conventional database engines. Spark SQL allows users to formulate their complex business requirements to Spark by using the familiar language of SQL.

So, in this blog, we will see how you can process data with Apache Spark and what better way to establish the capabilities of Spark than to put it through its paces and use the Hadoop-DS benchmark to compare performance, throughput, and SQL compatibility against SQL Server.

Before we begin, ensure that the following test environment is available:

SQL Server: 32 GB RAM with Windows server 2012 R2.
Hadoop Cluster: Two machines with 8GB RAM, Ubuntu flavor.

#databases #spark #apache-spark #sql-server #hadoop

dzone.com

Data Processing with Apache Spark