As has become tradition on KDnuggets, let’s start a new week with a new eBook. This time we check out a survey style text with a variety of topics, Foundations of Data Science.

We’re back at it with a new free eBook again this week. This time we will be covering a text with a name that speaks for itself, Foundations of Data Science, written by Avrim Blum, John Hopcroft, and Ravindran Kannan. A book with a such a name is making a pretty big statement. Luckily, its content backs it up.

Cover"

First off, it should be noted that this book is not structured like a typical data science book. Neither its chapters nor their progression fit the mold of a standard contemporary data science text in my view. You can see, from the table of contents listed below, that the text really surveys a wide array of disparate topics, as opposed to simply creating an equivalency between data science and machine learning, for example, and progressing as such:

  1. Introduction
  2. High-Dimensional Space
  3. Best-Fit Subspaces and Singular Value Decomposition (SVD)
  4. Random Walks and Markov Chains
  5. Machine Learning
  6. Algorithms for Massive Data Problems: Streaming, Sketching, and Sampling
  7. Clustering
  8. Random Graphs
  9. Topic Models, Nonnegative Matrix Factorization, Hidden Markov Models, and Graphical Models
  10. Other Topics
  11. Wavelets
  12. Appendix

The varied high-level topics, and early inclusion of chapters on high-dimensional space, subspaces, and random walks or Markov Chains, reinforces this survey style. This also makes me think of another classic book in data science with which you may be familiar, Mining of Massive Datasets. Stressing that this text focuses on “foundation,” you won’t find the latest neural network architectures covered herein. However, if you want to eventually be able to understand the whys and hows of some of these more complex approaches to data science problem solving, you should find Foundations of Data Science useful.

Matrix factorization, graph theory, kernel methods, clustering theory, streaming, gradients descent, data sampling; these are all concepts that will serve you well later, when it comes to solving data science problems, and they are all essential building blocks to implementing more complex approaches as well. You won’t be able to understand neural networks without gradient descent. You can’t analyze social media networks without graph theory. The models you build won’t be of value if you can’t understand when and why you would sample from data.

Similar to some other books we have recently profiled (such as The Elements of Statistical Learning and Understanding Machine Learning), this book is unabashedly theoretical. There is no code. There are no Python libraries being leaned on. There is no hand-waviness. There are only thorough explanations leading to understanding of these varied topics, should you spend the necessary time reading.

#2020 jul tutorials #overviews #data science #free ebook #data analysis

Foundations of Data Science: The Free eBook - KDnuggets
1.20 GEEK