Data Algorithms: Hadoop and Spark Recipes

Data Algorithms Book

  • Author: Mahmoud Parsian (
  • Title: Data Algorithms: Recipes for Scaling up with Hadoop and Spark
  • This GitHub repository will host all source code and scripts for Data Algorithms Book.
  • Publisher: O'Reilly Media
  • Published date: July 2015

Git Repository

The book's codebase can also be downloaded from the git repository at:

git clone

2nd Edition! Coming Out @ the End of 2021

Upgraded to Spark-3.1.2

Production Version is Available NOW!

Data Algorithms Book

Java 8's LAMBDA Expressions to Spark...

Scala Spark Solutions

How To Build using Apache's Ant

How To Build using Apache's Maven

Machine Learning Algorithms using Spark

Spark for Cancer Outlier Profile Analysis

Webinars and Presentions on Data Algorithms

Introduction to MapReduce

Bonus Chapters

Author Book Signing

How To Run Spark/Hadoop Programs

Submit a Spark Job from Java Code

How To Run Python Programs

To run python programs just call them with spark-submit together with the arguments to the program.

My favorite quotes...


  • View Mahmoud Parsian's profile on LinkedIn
  • Please send me an email:
  • Twitter: @mahmoudparsian

Thank you!

best regards,
Mahmoud Parsian

Data Algorithms Book

Download Details:

Author: mahmoudparsian

Official Github: 

License: View license

#data #data-analysis #data-science 

Data Algorithms: Hadoop and Spark Recipes
1.05 GEEK