Data Science

Data science is a broad field that refers to the collective processes, theories, concepts, tools and technologies that enable the review, analysis and extraction of valuable knowledge and information from raw data.

data-science datascience machinelearning ai machine-learning

Solving Combinatorial Problems with PySpark

Partitioning combinatorial problems using binary representation.Let us consider the problem statement. Given n real numbers x1, x2, xn, choose any set of distinct numbers such that function f on those chosen numbers gives maximum value. function f can take any number of inputs so one can choose any number of numbers.

Deep Dive Into the Apache Spark Driver on a Yarn Cluster

Spark Driver hosted against a Spark application is solely responsible for driving and supervising the parallel execution of the later in a…While running a Spark application on a cluster, the driver container, running the application master, is the first one to be launched by the cluster resource manager. Application master, after initializing its components, launches the primary driver thread, in the same container.

Urban Sound Classification with Librosa — tricky cross-validation

Featuring the Leave One Group Out strategy using scikit-learn.I’ll show an example of implementing the results of an interesting research paper on classifying audio clips based on their sonic content. This will include applications of the librosa library, which is a Python package for music and audio analysis. The clips are short audio clips from city, and the classification task is predicting the appropriate category label.

Robust 2 DataFrames verification with Pandas 1.1.0

Use recently added DataFrame.compare() for robust qualification checking.Pandas is one of the most used Python library for both data scientist and data engineers. Today, I want to share some Python tips to help us do qualification checks between 2 Dataframes. Notice, I have used the word: qualification, instead of identical.

How To Make Your Python Code Run Faster — 1st Installment

Optimize the utilization of your system hardware.In the last tutorial, we introduced you to line_profiler, a package that can help you time profile your code. It is now time to take a step forward. In this tutorial, we will learn about implementing multi-threading and multi-processing approaches using Python. These approaches guide the operating system to make optimum utilization of one’s system hardware and hence make the code execution efficient.

GitHub is the best AutoML you will ever need

A step-by-step tutorial to build AutoML using PyCaret 2.0.You may be wondering since when did GitHub get into the business of Automated Machine Learning. Well, it didn’t but you can use it like one. In this tutorial, we will show you how to build your personalized Automated Machine Learning software and host it on GitHub so that others can use it for free or for a paid subscription, if you wish.

Geobinning Starbucks

Using Python, Geobinning and Matplotlib to Generate Choropleth Maps.I see a lot of choropleth maps out there that looks very nice and really draw a lot of attention. Creating these is super easy if you have a clean dataset, however, a lot of the data I want to explore is not bucketed in the way I would like. More often than not it comes in the form of coordinates (lat/lng) which leaves it up to me to figure out which areas of the map they should go.

NLP 101 — Data Preprocessing & Representation Using NLTK.

An insight into how vital a role data pre-processing and representation play in Natural Language Processing and how to go about it.NLP or Natural Language Processing primarily deals with how machines understand, convert and perceive textual data present in human-readable languages into formats that they can perform computations on. Contemporary corporates often work with huge amounts of data.

Did Liverpool Deserve to Win the Premier League?

Premier League 2019/20 Review Using Python, R, and Expected Goals.I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.

Finding Relevant Job Skills via API in Python

Working with JSON and Python on data from a rich labor market API.So you want to figure out where your skills fit into today’s job market. Maybe you’re just curious to see a comprehensive constellation of job skills, clean and standardized. Or you need a taxonomy of skills for a Resume parsing project. Well, the EMSI skills API is one possible tool for the job!

All vectors are UNequal, but some are more UNequal than the others…

Searching for meaning in Trump’s tweets. Part II.Continuing on from where we left off in Part I and diving into vectorised text. Discussing the problems of using text vector representations and setting up the scene for the final part, where we are going to try and fix them. In part I, we have set up the problem and gone through some basic exploratory analysis of the dataset.

The Right Way to Access a Dictionary

Be Careful! You might have been doing it wrong.Dictionary is one of the data structures that are ready to use when programming in Python.Dictionary is an unordered and unordered Python collection that maps unique keys to some values. In Python, dictionaries are written by using curly brackets {} . The key is separated from the key by a colon : and every key-value pair is separated by a comma ,. Here’s how dictionaries are declared in Python.

Quickly Develop Highly Performant APIs With FastAPI and Python

And why you should consider FastAPI for your next API project.If you have read some of my previous Python articles, you know I’m a Flask fan. It is my go-to for building APIs in Python. However, recently I started to hear a lot about a new API framework for Python called FastAPI. After building some APIs with it, I can say it is amazing!

Closed-form and Gradient Descent Regression Explained with Python

Regression is a kind of supervised learning algorithm within machine learning. It is an approach to model the relationship between the dependent variable (or target, responses), y, and explanatory variables (or inputs, predictors), X. Its objective is to predict a quantity of the target variable, for example; predicting the stock price, which differs from classification problem, where we want to predict the label of target, for example; predicting the direction of stock (up or down).

Automated Browsers, Scraping and Crawling 

All you need is introductory knowledge of Python and some trial and error.I noticed there aren’t too many resources on web crawling that are geared toward total beginners, so I decided to make one. Part 1 will be for complete beginners then we’ll take a more object oriented approach to it in Part 2 (which will be a separate article)

Confusion Matrix and Object Detection

A metric to evaluate performance of instance segmentation models.After training a machine learning classifier, the next step is to evaluate its performance using relevant metric(s). The confusion matrix is one of the evaluation metrics.A confusion matrix is a table showing the performance of a classifier given some truth values/instances (supervised learning kind of).

Count Items in Python With the Help of Counter Objects

The easy way to count objects in a data container.When we deal with data containers, such as tuples and lists, in Python we often need to count particular elements. One common way to do this is to use the count() function — you specify the element you want to count and the function returns the count.

How to Track Coronavirus Spreading using Real Data

Step-by-step guide to timeline data visualization with Plotly. Inthis post, I would like to show you how to create an interactive map plot using the Coronavirus cases data.

Assumptions of Linear Regression

And how to test them using Python. Linear Regression is the bicycle of regression models. It’s simple yet incredibly useful. It can be used in a variety of domains. It has a nice closed formed solution, which makes model training a super-fast non-iterative process.

How to make a wordcloud of your blog, programmatically?

Recently, I was in need of an image for our blog and wanted it to have some wow effect or at least a better fit than anything typical we’ve been using. Pondering over ideas for a while, a word cloud flashed in my mind.