Understanding persistence in Apache Spark

Understanding persistence in Apache Spark

Understanding persistence in Apache Spark. In this blog, we will try to understand the concept of Persistence in Apache Spark in a very layman term with scenario-based examples.

In this blog, we will try to understand the concept of Persistence in Apache Spark in a very layman term with scenario-based examples.

Note: The scenarios are only meant for your easy understanding.

Spark Architecture

NoteCache memory can be shared between Executors.

What does it mean by persisting/caching an RDD?

Spark RDD persistence is an optimization technique which saves the result of RDD evaluation in cache memory. Using this we save the intermediate result so that we can use it further if required. It reduces the computation overhead.

When we persist an RDD, each node stores the partitions of it that it computes in memory and reuses them in other actions on that RDD (or RDD derived from it). This allows future actions to be much faster (often by more than 10x). Caching is a key tool for iterative algorithms and fast interactive use.

You can mark an RDD to be persisted using the persist() or cache() methods on it. The first time it is computed in an action, it will be kept in cache memory on the nodes. Spark’s cache is fault-tolerant – if any partition of an RDD is lost, it will automatically be recomputed using the transformations that originally created it.

Let’s say I have this transformation –

RDD3 => RDD2 => RDD1 => Text File

RDD4 => RDD3
RDD5 => RDD3

ai

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

This Week in AI | Rubik's Code

Every week we bring to you the best AI research papers, articles and videos that we have found interesting, cool or simply weird that week. Have fun!

This Week in AI - Issue #22 | Rubik's Code

Every week we bring to you the best AI research papers, articles and videos that we have found interesting, cool or simply weird that week. Have fun!

Amsterdam And Helsinki Launch Open AI Registers

Amsterdam And Helsinki Launch Open AI Registers. Amsterdam and Helsinki both launched an Open AI Register in beta version at the Next Generation Internet Summit.

Why Your Organization Is Struggling to Adopt AI (And How to Fix It)

Why Your Organization Is Struggling to Adopt AI (And How to Fix It). Barely 10% of organizations manage to adopt AI. Find solutions to the top 4 AI obstacles.

AI Artificial Intelligence in business 2020 : Types & Advantages

Explore to understand how AI artificial intelligence has advanced and presently serves as a roadmap to augment your business in 2020.