I think you will not argue with me when I state that data science is becoming one of the most popular fields to work at, especially given that Harvard Business Review named “data scientist” the sexiest job of the 21st century. In the field, we have come a long way, from the times when terms like data science and machine learning were still unknown and everything was gathered under the umbrella of statistics. However, we are far from the end of the journey.

That can also be a dividing aspect of data science — the field is developing so rapidly that it can be difficult to even follow all the new algorithms, techniques, and approaches. So working in data science, similarly to software engineering, often requires constant learning and development. Don’t get me wrong, some people (myself included) like that a lot. Others prefer to learn for a few years and then just cut the coupons from that knowledge. Both approaches are perfectly fine — it is a personal preference.

As I mentioned, working in data science can be a journey. That is why in this article, I want to share my 10 favorite data science resources (online ones), which I frequently use for learning and trying to keep up with the current developments. This list will focus on online resources (blogs, videos, podcasts) and will not cover MOOCs or books, as there is more than enough content there for a separate article. Let’s start!

1. Towards Data Science

This should come as no surprise, given you are reading this article published in Towards Data Science. TDS is Medium’s biggest publication covering all data science related topics. What you can find here:

  • beginner-friendly tutorials with code (in most popular languages such as Python, R, Julia, SQL, and more),
  • in-depth descriptions of particular ML algorithms or techniques,
  • summaries of influential papers,
  • descriptions of personal pet projects,
  • the latest news from the field,
  • and more!

TDS creates a really nice community in which everyone is encouraged to share and participate. Additionally, I can highly recommend joining the newsletter and following TDS on Twitter to keep up with the latest and most popular articles.

Lastly, I can also recommend the Towards Data Science podcast, which can be especially helpful for people wondering how to break into data science and find their perfect role.

2. PyData (conference + videos)

PyData is the educational program of NumFOCUS — a nonprofit charity promoting open practices in research, data, and scientific computing. They organize conferences all over the world encouraging researchers and practitioners to share their insights from their work. In the talks you can find a mix of general Python best practices, examples of real-life cases the data scientists worked on (for example, how they model churn or what tools they use to generate an uplift in their marketing campaigns), and introductions to some new libraries.

Speaking from experience, it is a lot of fun to actually attend the conference in person, as you can actively take part in the presentations, ask questions, and network with people who share your interests. However, as this is not always possible and simply there are too many conferences to attend, you can find all the recordings on their YouTube channel. Normally, the recordings are published a few months after each conference.

The PyData talks are a great source of inspiration, as you can see how other companies approached a particular topic, and maybe you can apply a similar method in your company.

3. Machine Learning Mastery

Jason Brownlee’s website/blog is a gold mine of content for data scientists, especially the more junior ones. You can find a plethora of tutorials, from classic statistical modeling approaches (linear regression, ARIMA), to the latest and greatest machine/deep learning solutions. The articles are always very hands-on and contain Python code applying the particular concept to a toy dataset. What is really great about the website is that Jason clearly explains the concepts and also refers to further reading for those who want to dive extra deep into the theoretical background. You can also filter all the articles by the topic, in case you are interested only in imbalanced learning or how to code your first LSTM network.

4. Distill

Distill aims to provide a clear and intuitive explanation of machine learning concepts. They argue that papers are often restricted to the PDF files, which can not always show the full picture. And in times when ML gains more and more impact, it is crucial to have a good understanding of how the tools we are using actually work.

Distill uses impressive and interactive visualizations to clearly explain what is actually happening behind the scenes of the machine learning algorithms. One of my favorite articles there described t-SNE (t-distributed stochastic neighbor embedding) and showed how the generated graphs, while visually pleasing can be misleading. It also pointed out the significance of the hyperparameters by providing an interactive tool to see the impact first-hand.

If you need any extra assurances about the quality of the content there, the steering committee behind Distill included names such as Yoshua Bengio, Ian Goodfellow, Michael Nielsen, Andrej Karpathy.

#deep-learning #education #data-science #towards-data-science #machine-learning

My 10 favorite resources for learning data science online
1.40 GEEK