In this tutorial, we'll learn 4 Common Reasons for FetchFailed Exception in Apache Spark. Let's explore it with us now.
Shuffle operations are the backbone of almost all Spark Jobs that are aimed at data aggregation, joins, or data restructuring. During a shuffle operation (Without the support of External Shuffle service), the data is shuffled across various nodes of the cluster via a two-step process:
a) Shuffle Write: Shuffle map tasks write the shuffle data to be shuffled in a disk file, the data is arranged in the file according to shuffle reduce tasks. A bunch of shuffle data corresponding to a shuffle reduce task written by a shuffle map task is called a shuffle block. Further, each of the shuffle map tasks informs the driver about the written shuffle data.
b) Shuffle Read: Shuffle reduce tasks queries the driver about the locations of their shuffle blocks. Then these tasks establish connections with the executors hosting their shuffle blocks and start fetching the required shuffle blocks. Once a block is fetched, it is available for further computation in the reduce task.
The two-step process of a shuffle although sounds simple, but is operationally intensive as it involves data sorting, disk writes/reads, and network transfers. Therefore, there is always a question mark on the reliability of a shuffle operation, and the evidence of this unreliability is the commonly encountered ‘FetchFailed Exception’ during the shuffle operation. Most Spark developers spend considerable time in troubleshooting this widely encountered exception. First, they try to find out the root cause of the exception, and then accordingly put the right fix for the same.
Troubleshooting hundreds of Spark Jobs in recent times, I have realized that FetchFailed Exception mainly comes due to the following reasons:
Let's understand each of these reasons in detail:
In this tutorial, we'll learn Top 10 Top Big Data Tools for Data Management and Analytics. Best Big Data open source tools for data management, integration, ETL, Data Processing, Storage, Data warehouse and Big Data analytics
For Big Data Analytics, the challenges faced by businesses are unique and so will be the solution required to help access the full potential of Big Data.
‘Data is the new science. Big Data holds the key answers’ - Pat Gelsinger The biggest advantage that the enhancement of modern technology has brought
Everything we do generates Data, therefore we are Data Agents. The question is: how we can benefit from this huge amount of data generated every day?. This post Get Started With Big Data Analytics For Your Business
What is Big Data? What is Data Analytics? And What is the difference between Data Analytics and Big Data? Let's explore it with us now.