Lessons from a Data Scientist

Python has risen to become the king of languages for data science. Most new data scientists and programmers continue to learn Python for their first language. This is for good reason; Python has a shallow learning curve, a strong community and a rich data science ecosystem of libraries.

I started my data science journey with Python, and it continues to be my most common tool of choice solving Data Science problems. I was interested to get a better understanding of what Python abstracts away from you and the costs vs. benefits of writing faster code in a more performant language.

In order to get a representative introduction to C++, I needed a representative application for which C++ would be an appropriate choice. Implementing a categorical decision tree classifier from scratch seemed an appropriate challenge. This has turned out to be a testing but rewarding learning journey, and I would like to share some of my main learnings along the way.

Key learnings:

  1. C++ provides little guidance or protection
  2. Make good architectural decisions early
  3. Writing tests will save you in the long run
  4. The online community of a language is worth a lot
  5. Portability is an important consideration

#python #data-science #cplusplus

How To Implement The Decision Tree From Scratch with C++
3.30 GEEK