There’s power in numbers, even for machine learning algorithms. It has been shown repeatedly that the best way to arrive at an estimate is to ask lots of people from diverse backgrounds.
One of the most fascinating historical examples of the power of crowds can be found within the pages of James Surowiecki’s “The Wisdom of Crowds,” in which a team of engineers, oceanographers, salvage crew members, and mathematicians were asked to make their best estimate on where a particular sunken submarine, the Scorpion, could be found (the Navy did not have the manpower to search the entire area and wanted a more specific guess). Individually, these guessers were wildly inaccurate, but the group’s combined averaged guess was only 220m from the actual position of the sunken submarine!
It has been shown repeatedly that the best way to arrive at an estimate is to ask lots of people from diverse backgrounds — the more diverse the better. How can we apply this sociological concept to machine learning?
Ensemble models are just a conglomerate of models that are averaged to provide a “crowd’s guess.” Just as humans have bias, so too do models carry with them inherent assumptions and bias. Averaging these out across a few models is almost guaranteed to decrease error.
For this example, I used some credit card fraud data from Kaggle. The first item of business was to pick an evaluation metric. I noticed that there was a huge imbalance of classes (99.8% of the data was marked as normal transaction volume, the other 0.2% were the fraudulent transactions), so the accuracy metric was out of the picture. For this type of problem, it’s better to choose precision, recall, or F1 scores. For simplicity, I chose precision.
data-science credit-card-fraud crowd algorithms machine-learning deep learning
Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant
PyTorch for Deep Learning | Data Science | Machine Learning | Python. PyTorch is a library in Python which provides tools to build deep learning models. What python does for programming PyTorch does for deep learning. Python is a very flexible language for programming and just like python, the PyTorch library provides flexible tools for deep learning.
Data Augmentation is a technique in Deep Learning which helps in adding value to our base dataset by adding the gathered information from various sources to improve the quality of data of an organisation.
In this article, I clarify the various roles of the data scientist, and how data science compares and overlaps with related fields such as machine learning, deep learning, AI, statistics, IoT, operations research, and applied mathematics.
PyTorch is a library in Python which provides tools to build deep learning models. What python does for programming PyTorch does for deep learning.