Machine Learning

Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on a specific task. Machine learning algorithms build a mathematical model of sample data...

machine-learning machinelearning ml

BYOL: Bring Your Own Loss

How we improve delivery time estimation with a custom loss function.Dear connoisseurs, I invite you to take a look inside Careem’s food delivery platform. Specifically, we are going to look at how we use machine learning to improve the customer experience for delivery time tracking.

Scaling Iterative Algorithms in Spark

Iterative algorithms are widely implemented in machine learning, connected components, page rank, etc. These algorithms increase in complexity with iterations, size of data at each iteration and making it fault-tolerant at every iteration is tricky. In this article, I would elaborate few of considerations in spark to work with these challenges.

Why Spark quicker than MapReduce ?

MapReduce — MapReduce is an technique to process large data and consists of Map + Reduce. This technique or algorithm can be used even if the framework is not Hadoop. This MapReduce algorithm to handle all cluster nodes can be written in Java. This algorithm API to handle all cluster nodes has already been written by Hadoop.

Urban Sound Classification with Librosa — tricky cross-validation

Featuring the Leave One Group Out strategy using scikit-learn.I’ll show an example of implementing the results of an interesting research paper on classifying audio clips based on their sonic content. This will include applications of the librosa library, which is a Python package for music and audio analysis. The clips are short audio clips from city, and the classification task is predicting the appropriate category label.

How To Make Your Python Code Run Faster — 1st Installment

Optimize the utilization of your system hardware.In the last tutorial, we introduced you to line_profiler, a package that can help you time profile your code. It is now time to take a step forward. In this tutorial, we will learn about implementing multi-threading and multi-processing approaches using Python. These approaches guide the operating system to make optimum utilization of one’s system hardware and hence make the code execution efficient.

GitHub is the best AutoML you will ever need

A step-by-step tutorial to build AutoML using PyCaret 2.0.You may be wondering since when did GitHub get into the business of Automated Machine Learning. Well, it didn’t but you can use it like one. In this tutorial, we will show you how to build your personalized Automated Machine Learning software and host it on GitHub so that others can use it for free or for a paid subscription, if you wish.

NLP 101 — Data Preprocessing & Representation Using NLTK.

An insight into how vital a role data pre-processing and representation play in Natural Language Processing and how to go about it.NLP or Natural Language Processing primarily deals with how machines understand, convert and perceive textual data present in human-readable languages into formats that they can perform computations on. Contemporary corporates often work with huge amounts of data.

Did Liverpool Deserve to Win the Premier League?

Premier League 2019/20 Review Using Python, R, and Expected Goals.I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.

All vectors are UNequal, but some are more UNequal than the others…

Searching for meaning in Trump’s tweets. Part II.Continuing on from where we left off in Part I and diving into vectorised text. Discussing the problems of using text vector representations and setting up the scene for the final part, where we are going to try and fix them. In part I, we have set up the problem and gone through some basic exploratory analysis of the dataset.

Closed-form and Gradient Descent Regression Explained with Python

Regression is a kind of supervised learning algorithm within machine learning. It is an approach to model the relationship between the dependent variable (or target, responses), y, and explanatory variables (or inputs, predictors), X. Its objective is to predict a quantity of the target variable, for example; predicting the stock price, which differs from classification problem, where we want to predict the label of target, for example; predicting the direction of stock (up or down).

Linear Regression in Python: Sklearn vs Excel

Is Microsoft excel a good alternative for a quick and approximate linear regression prediction business case? I think yes, but let’s do a…Around 13 years ago, Scikit-learn development started as a part of Google Summer of Code project by David Cournapeau. As time passed Scikit-learn became one of the most famous machine learning library in Python. It offers several classifications, regression and clustering algorithms and its key strength, in my opinion, is seamless integration with Numpy, Pandas and Scipy.

Confusion Matrix and Object Detection

A metric to evaluate performance of instance segmentation models.After training a machine learning classifier, the next step is to evaluate its performance using relevant metric(s). The confusion matrix is one of the evaluation metrics.A confusion matrix is a table showing the performance of a classifier given some truth values/instances (supervised learning kind of).

Apache Spark MLlib & Ease-of Prototyping With Docker

The core operational capabilities and how to launch a cluster instantly using just one command.Apache Spark is the most developed library that you can utilize for many of your Machine Learning applications. It provides the users with the ease of developing ML-based algorithms in data scientist’s favorite scientific prototyping environment Jupyter Notebooks.

Machine Learning - Visualized

A visual approach to understand machine learning. In this article, we will learn how this happens. Dataset. To visualize the dataset, let's make our synthetic dataset where each data point (input x) is ...

How To Perform Feature Selection for Regression Problems

In this article I explain what feature selection is and how to perform it before training a regression model in Python. Feature selection is the procedure of selecting a subset (some out of all available) of the input variables that are most relevant to the target ...

3 Best Books for Beginner Data Scientists

Improve your data analysis skills by getting these three key books. There are loads of resources on personal blogs, Youtube, and my favorite site: Towards Data Science! However, I find that books are still an ...

How to Track Coronavirus Spreading using Real Data

Step-by-step guide to timeline data visualization with Plotly. Inthis post, I would like to show you how to create an interactive map plot using the Coronavirus cases data.

Forecasting Tesla’s Stock Price using Autoregression

Forecasting Tesla’s Stock Price using Autoregression.Learn how to apply a fundamental time series modelling technique to Tesla’s stock price using Python.

Recommender System — singular value decomposition (SVD) & truncated SVD

In this article, you will learn the singular value decomposition and truncated SVD of the recommender system

Predicting Hazardous Seismic Bumps Part I : EDA, Feature Engineering

This article demonstrates exploratory data analysis (EDA), feature engineering, and splitting strategies for unbalanced data using the seismic bumps dataset from the UCI Data Archive.