How we improve delivery time estimation with a custom loss function.Dear connoisseurs, I invite you to take a look inside Careem’s food delivery platform. Specifically, we are going to look at how we use machine learning to improve the customer experience for delivery time tracking.
Iterative algorithms are widely implemented in machine learning, connected components, page rank, etc. These algorithms increase in complexity with iterations, size of data at each iteration and making it fault-tolerant at every iteration is tricky. In this article, I would elaborate few of considerations in spark to work with these challenges.
MapReduce — MapReduce is an technique to process large data and consists of Map + Reduce. This technique or algorithm can be used even if the framework is not Hadoop. This MapReduce algorithm to handle all cluster nodes can be written in Java. This algorithm API to handle all cluster nodes has already been written by Hadoop.
Featuring the Leave One Group Out strategy using scikit-learn.I’ll show an example of implementing the results of an interesting research paper on classifying audio clips based on their sonic content. This will include applications of the librosa library, which is a Python package for music and audio analysis. The clips are short audio clips from city, and the classification task is predicting the appropriate category label.
Optimize the utilization of your system hardware.In the last tutorial, we introduced you to line_profiler, a package that can help you time profile your code. It is now time to take a step forward. In this tutorial, we will learn about implementing multi-threading and multi-processing approaches using Python. These approaches guide the operating system to make optimum utilization of one’s system hardware and hence make the code execution efficient.
A step-by-step tutorial to build AutoML using PyCaret 2.0.You may be wondering since when did GitHub get into the business of Automated Machine Learning. Well, it didn’t but you can use it like one. In this tutorial, we will show you how to build your personalized Automated Machine Learning software and host it on GitHub so that others can use it for free or for a paid subscription, if you wish.
An insight into how vital a role data pre-processing and representation play in Natural Language Processing and how to go about it.NLP or Natural Language Processing primarily deals with how machines understand, convert and perceive textual data present in human-readable languages into formats that they can perform computations on. Contemporary corporates often work with huge amounts of data.
Premier League 2019/20 Review Using Python, R, and Expected Goals.I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.
Searching for meaning in Trump’s tweets. Part II.Continuing on from where we left off in Part I and diving into vectorised text. Discussing the problems of using text vector representations and setting up the scene for the final part, where we are going to try and fix them. In part I, we have set up the problem and gone through some basic exploratory analysis of the dataset.
Regression is a kind of supervised learning algorithm within machine learning. It is an approach to model the relationship between the dependent variable (or target, responses), y, and explanatory variables (or inputs, predictors), X. Its objective is to predict a quantity of the target variable, for example; predicting the stock price, which differs from classification problem, where we want to predict the label of target, for example; predicting the direction of stock (up or down).
Is Microsoft excel a good alternative for a quick and approximate linear regression prediction business case? I think yes, but let’s do a…Around 13 years ago, Scikit-learn development started as a part of Google Summer of Code project by David Cournapeau. As time passed Scikit-learn became one of the most famous machine learning library in Python. It offers several classifications, regression and clustering algorithms and its key strength, in my opinion, is seamless integration with Numpy, Pandas and Scipy.
A metric to evaluate performance of instance segmentation models.After training a machine learning classifier, the next step is to evaluate its performance using relevant metric(s). The confusion matrix is one of the evaluation metrics.A confusion matrix is a table showing the performance of a classifier given some truth values/instances (supervised learning kind of).
The core operational capabilities and how to launch a cluster instantly using just one command.Apache Spark is the most developed library that you can utilize for many of your Machine Learning applications. It provides the users with the ease of developing ML-based algorithms in data scientist’s favorite scientific prototyping environment Jupyter Notebooks.
A visual approach to understand machine learning. In this article, we will learn how this happens. Dataset. To visualize the dataset, let's make our synthetic dataset where each data point (input x) is ...
In this article I explain what feature selection is and how to perform it before training a regression model in Python. Feature selection is the procedure of selecting a subset (some out of all available) of the input variables that are most relevant to the target ...
Improve your data analysis skills by getting these three key books. There are loads of resources on personal blogs, Youtube, and my favorite site: Towards Data Science! However, I find that books are still an ...
Step-by-step guide to timeline data visualization with Plotly. Inthis post, I would like to show you how to create an interactive map plot using the Coronavirus cases data.
Forecasting Tesla’s Stock Price using Autoregression.Learn how to apply a fundamental time series modelling technique to Tesla’s stock price using Python.
In this article, you will learn the singular value decomposition and truncated SVD of the recommender system
This article demonstrates exploratory data analysis (EDA), feature engineering, and splitting strategies for unbalanced data using the seismic bumps dataset from the UCI Data Archive.