Spark providing us a high-level API – Dataset, which makes it easy to get type safety and securely perform manipulation in a distributed and a local environment without code changes. Also, spark structured streaming, a high-level API for stream processing allows us to stream a particular Dataset which is nothing but a type-safe structured streams. In this blog, we will see how we can create a type-safe structured streams using spark.

To create a type-safe structured stream first we need to read a Dataset. So, we will read a Dataset from socket basically, from a NetCat utility. We will paste some JSON data in the NetCat program to create Streaming Dataset. Let’s First create an entry point to our structured Dataset i.e, spark session.

val spark = SparkSession.builder()
    .appName("Streaming Datasets")
    .master("local[2]")
    .getOrCreate()

#apache spark #scala #spark #dataframe #datastream api #spark streaming

Spark: Streaming Datasets
2.25 GEEK