4 Common Reasons for FetchFailed Exception in Apache Spark

4 Common Reasons for FetchFailed Exception in Apache Spark

In this tutorial, we'll learn 4 Common Reasons for FetchFailed Exception in Apache Spark. Let's explore it with us now.

This article lists out the most common four reasons for a FetchFailed exception in Apache Spark.

Shuffle operations are the backbone of almost all Spark Jobs that are aimed at data aggregation, joins, or data restructuring. During a shuffle operation (Without the support of External Shuffle service), the data is shuffled across various nodes of the cluster via a two-step process:

a) Shuffle Write: Shuffle map tasks write the shuffle data to be shuffled in a disk file, the data is arranged in the file according to shuffle reduce tasks. A bunch of shuffle data corresponding to a shuffle reduce task written by a shuffle map task is called a shuffle block. Further, each of the shuffle map tasks informs the driver about the written shuffle data.

b) Shuffle Read: Shuffle reduce tasks queries the driver about the locations of their shuffle blocks. Then these tasks establish connections with the executors hosting their shuffle blocks and start fetching the required shuffle blocks. Once a block is fetched, it is available for further computation in the reduce task.

The two-step process of a shuffle although sounds simple, but is operationally intensive as it involves data sorting, disk writes/reads, and network transfers. Therefore, there is always a question mark on the reliability of a shuffle operation, and the evidence of this unreliability is the commonly encountered ‘FetchFailed Exception’ during the shuffle operation. Most Spark developers spend considerable time in troubleshooting this widely encountered exception. First, they try to find out the root cause of the exception, and then accordingly put the right fix for the same. 

Troubleshooting hundreds of Spark Jobs in recent times, I have realized that FetchFailed Exception mainly comes due to the following reasons:

  1. Out of Heap memory on Executors
  2. Low Memory Overhead on Executors
  3. Shuffle block greater than 2 GB
  4. Network TimeOut.

Let's understand each of these reasons in detail:

bigdata hadoop data data analytics spark big data analytics artifical intelligence etl

What is Geek Coin

What is GeekCash, Geek Token

Best Visual Studio Code Themes of 2021

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Top 10 Big Data Tools for Data Management and Analytics

In this tutorial, we'll learn Top 10 Top Big Data Tools for Data Management and Analytics. Best Big Data open source tools for data management, integration, ETL, Data Processing, Storage, Data warehouse and Big Data analytics

Big Data Analytics: Unrefined Data to Smarter Business Insights - TopDevelopers.co

For Big Data Analytics, the challenges faced by businesses are unique and so will be the solution required to help access the full potential of Big Data.

Silly mistakes that can cost ‘Big’ in Big Data Analytics

‘Data is the new science. Big Data holds the key answers’ - Pat Gelsinger The biggest advantage that the enhancement of modern technology has brought

Get Started With Big Data Analytics For Your Business

Everything we do generates Data, therefore we are Data Agents. The question is: how we can benefit from this huge amount of data generated every day?. This post Get Started With Big Data Analytics For Your Business

Big Data vs Data Analytics: Difference Between Big Data and Data Analytics

What is Big Data? What is Data Analytics? And What is the difference between Data Analytics and Big Data? Let's explore it with us now.