Apache Spark is a large scale distributed computing framework used for analytics and BigData processing.

Distributed computing programs are tricky to test in your local or with smaller dataset. Testing a Spark code is made easy with few base classes provided by Holden Karau** : **Spark Testing Base


Spark Testing Base

The library uses property based testing philosophy to produce the fuzziness in Spark RDD, DataFrame and DataSets among other common base classes.

Base classes which are handy:

  • **_SharedSparkContext : _**Provides a SparkContext for each test case
  • **_RDDComparisons : _**Base class to provide functions to compare the RDD
  • **_RDDGenerator : _**Generator for RDD object
  • **_DataFrameGenerator : _**Generator for DataFrame object
  • **_DataSetGenerator : _**Generator for DataSet object

#testing #scalatest #spark #scala #big-data-analytics

Spark Testing Base : ScalaTest + ScalaCheck
11.25 GEEK