Spark behavior on native file system

We are experimenting to run Spark in our project without Hadoop and no distributed storage like HDFS. Spark is installed on a single node with 10 Cores and 16GB RAM and this node is not part of any cluster. Assuming Spark driver takes 2 cores and the rest of them are consumed by executors(2 each) at the time of execution.

If we process a big CSV file (of size 1 GB) stored in local disk in Spark as RDD and repartition it to 4 different partitions, will executors process each partition in parallel? What would executors do if we don't repartition the RDD to 4 diff partitions? Do we loose the power of distributed computing and parallelism if dont use HDFS?

#hadoop #apache-spark

2 Likes2.60 GEEK