TL; DR — The article compares XGBoost and PyTorch — two of the most popular ML libraries — and discuss which one is better. You can find out what the community thinks in a live survey.


Disclaimer_: All opinions are mine, they don’t represent my employer’s. _I use “algorithm”, “model”, and “library” interchangeably.


A Choice

During a virtual coffee — the new way of staying in touch during COVID — with my youngest cousin, who’s completing his Master in Data Science, Shawn asked: “Between XGBoost and PyTorch, which one should I learn?”

In fact, I came across the same question frequently even when I work with clients with many years of industry experience.

On one hand, XGBoost helped to win most of the Kaggle competitions. On the other hand, PyTorch has been recognized as the go-to library by leading tech and research firms. Both libraries are on par based on their build quality and active community support.

The answer is obviously “learn both” if you have all the time, resources, and mental energy in the world. Most of us don’t have such luxury. In most cases, I’d recommend: start with XGBoost, then PyTorch.

Let’s look at it from three simple angles: the supply, the demand, and your situation and aspiration.

1. The Supply: What is the library good at?

By design, XGboost and PyTorch are effective at solving different types of ML use cases.

XGBoost is very effective for** “traditional” ML use cases** using structure data (e.g. classification or regression with good-old tabular data). PyTorch is built specifically for “innovative” use cases with unstructured data (e.g. Generative Model using images or Natural Language Processing with text) that requires neural network model architecture. The tutorial page of each library reflects such difference and their intended “positioning”.

Image for post

XGBoost Tutorial on Awesome XGBoost, Capture on May 22, 2020

Image for post

Official PyTorch Tutorial Page, Captured on May 24, 2020

Given the different strengths in XGBoost and PyTorch, we must pick and choose according to the actual ML use cases.

In the context of learning, I believe a new skill should provide us with maximum immediate applicability (can I use it immediately and frequently?), which usually leads to more career options.

#towards-data-science #data-science #machine-learning #technology #business #data analysis

A Battle of XGBoost & PyTorch
5.40 GEEK