Writing distributed applications could be a time consuming process. While running simple spark.range( 0, 10 ).reduce( _ + _ ) ( A “Hello World” example of Spark ) code on your local machine is easy enough, it eventually gets complicated as you come across more complex real world use cases, especially in Structured Streaming world where you want to do streaming aggregations, join with other streams or with static datasets.
Processing streaming data from Kinesis, Kafka or S3 on your local machine, as you write code, might not be feasible for a number of reasons: 1 ) you don’t have enough compute power available. 2) You have to process data from “earliest” offset on your messaging queue but there’s too much data to process. Even if you rate limit, it could take hours to process that data.

#structured-streaming #development #big-data #apache-spark #scala

Spark Streaming with HTTP REST endpoint serving JSON data
29.75 GEEK