Scaling Iterative Algorithms in Spark

Scaling Iterative Algorithms in Spark

Iterative algorithms are widely implemented in machine learning, connected components, page rank, etc. These algorithms increase in complexity with iterations, size of data at each iteration and making it fault-tolerant at every iteration is tricky. In this article, I would elaborate few of considerations in spark to work with these challenges.

Iterative algorithms are widely implemented in machine learning, connected components, page rank, etc. These algorithms increase in complexity with iterations, size of data at each iteration and making it fault-tolerant at every iteration is tricky. In this article, I would elaborate few of considerations in spark to work with these challenges. We have used Spark in implementing few iterative algorithms like building connected components, traversing large connected components etc. Below are from my experience while working at Walmart labs building connected components for 60 Billion nodes of customer identities.

Image for post

Number of iterations is never predetermined, always have terminate condition 😀

Types of Iterative algorithms

*Converging data: *Here we see the amount of data decreases with every iteration, i.e. we start 1st iteration with huge datasets and the data size decreases with an increase in iterations. The main challenge would be handling the huge datasets in the first few iterations and once a dataset has reduced significantly it would be easy to handle further iterations until end condition.

*Diverging data: *Amount of data increases at every iteration and sometimes it could explode faster and make it impossible to proceed further. We need to have constraints like limitations in the number of iterations, starting data size, size of compute power etc. to make these algorithms work.

machine-learning algorithms big-data spark

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Top Microsoft big data solutions Companies | Best Microsoft big data Developers

An extensively researched list of top microsoft big data analytics and solution with ratings & reviews to help find the best Microsoft big data solutions development companies around the world.

Silly mistakes that can cost ‘Big’ in Big Data Analytics

‘Data is the new science. Big Data holds the key answers’ - Pat Gelsinger The biggest advantage that the enhancement of modern technology has brought

Big Data can be The ‘Big’ boon for The Modern Age Businesses

We need no rocket science in understanding that every business, irrespective of their size in the modern-day business world, needs data insights for its expansion. Big data analytics is essential when it comes to understanding the needs and wants of a significant section of the audience.

Role of Big Data in Healthcare - DZone Big Data

In this article, see the role of big data in healthcare and look at the new healthcare dynamics. Big Data is creating a revolution in healthcare, providing better outcomes while eliminating fraud and abuse, which contributes to a large percentage of healthcare costs.

How you’re losing money by not opting for Big Data Services?

Big Data Analytics is the next big thing in business, and it is a reality that is slowly dawning amongst companies. With this article, we have tried to show you the importance of Big Data in business and urge you to take advantage of this immense...