Thus, categorical features encoding becomes a necessary step for any automated machine learning approaches. It not only elevates the model quality but also helps in better feature engineering.

Principal Component Analysis (PCA)

Principal Component Analysis (PCA). Dimensionality Reduction Technique

Naive Bayes Algorithm from scratch

Naive Bayes is a classification algorithm for binary (two-class) and multiclass classification problems. It is called Naive Bayes or idiot Bayes because the calculations of the probabilities for each class are simplified to make their calculations tractable.

Spelling Correction: How to make an accurate and fast corrector

We need an automatic spelling corrector which can fix words with typos and, at the same time not break correct spellings. But how can we achieve this?

Data Preparation: Missing Data

How to identify and deal with missing data using Python. In this blog, I'm attempting to discuss the different types of missing data and how to deal with them.

Data Preprocessing using Pandas drop() and drop_duplicates() functions

I already discussed dropna() and fillna() functions in Pandas, which can be used to deal with the missing data or NaN values.

The Ultimate Python Package to Pre-Process Data for Machine Learning

The Ultimate Python Package to Pre-Process Data for Machine Learning. Explore and Pre-process our dataset is probably the most important step in building an efficient Machine Learning model.

Upsample with an average in Pandas

Align inconsistently reported data for your machine learning. In this article, we’ll explore how to upsample with an average, which requires a little bit of extra coding.

Handling Missing Values : the exclusive pythonic guide

In this article, we will review the 3 most successful open source short python code lines which can be combined for handling missing values.

Preprocess FIFA World Cup data with Python

Preprocess FIFA World Cup data with Python. The next FIFA's world cup is coming soon and will begin in june, so I wanted to make some python visualization to practice to use matplotlib and seaborn.

Data Preparation: The Case for Using Automated, ML-Based Tools

We’ll specifically talk about data preparation as the most critical challenge and how an ML-based data preparation tool or software can make it easier to process data in the data lake.

4 techniques to enhance your Research in Machine Learning projects

In this post, I will delve into some techniques and tools that will help you out mastering your Research. While working in this stage, you should strive for simplicity and focus.

Pre-Process Data Like a Pro: Intro to Scikit-Learn Pipelines

Keep data clean. You wrote all your queries, gathered all the data and are all fired up to implement the latest machine learning algorithm you read about on Medium. Wait! You soon realize you need to deal with missing data, imputation, categorical data, standardization, etc.

Classification Framework for Imbalanced Data.

Understanding and utilizing imbalanced data.This blog covers the steps involved in tackling a classification problem in imbalanced dataset. The Github repository containing all the code is available here.

Data Cleaning and Preprocessing — Modelling Subscription for Bank Deposits

The exploration of data has always fascinated me. The kind of insights and information that can be hidden in raw data is invigorating to discover and communicate.

Word2Vec, GLOVE, FastText and Baseline Word Embeddings step

In our previous discussion we had understand the basics of tokenizers step by step. If you had not gone through my previous post i highly recommend just have a look at that post because to understand Embeddings first, we need to understand tokenizers and this post is the continuation of the previous post. I am providing the link below of my post on Tokenizers. I had explained the concepts step by step with a simple example Understanding N

Why “1.5” in IQR Method of Outlier Detection?

The idea for this post came when I was once helping one of my juniors with an assignment on outlier detection. It wasn’t a very complicated one, just an application of IQR Method of Outlier Detection on a dataset.

What to Keep and What to Remove

Feature Engineering on the data. Suppose you want to predict sales of ice-cream or gloves, or umbrella. What is common in these items?

NLP-Preprocessing Clinical data to find Sections

In this post we will be using healthcare chart notes data(doctor’s scribbled notes) to model topics that exist in Clinical notes. Keep in mind, there is no structure to write these notes