Difference Between Standardization & Normalization

This blog aims to explain the most confusing concepts in feature engineering which are Standardization & Normalization. Both look very similar, & most of the time, most of the people fail to understand the difference between them, & the use-case for each of them. But, no worries, this blog will act as a helping hand to make everyone understand the difference between them & their use-cases.

Using neural networks with embedding layers to encode high cardinality

We will build a neural network using embeddings to encode the categorical features, moreover we will benchmark the model against a very naive linear model without categorical variables, and a more sophisticated regularized linear model with one-hot-encoded features.

All About Table Joining In PySpark

Some thoughts on how to design table joining and conduct quick quality check of table joining results. I will use a dummy example to illustrate my typical routine of writing a table joining and conducting a quick quality check on the results.

Extending Target Encoding

Extending Target Encoding. Leveraging target encoding when your categorical variables have a hierarchical structure

A new way to BOW Analysis & Feature Engineering — Part1

A New Way to BOW Analysis & Feature Engineering — Part1. Compare the frequency distributions across labels without building an ML model.

One Hot Encoding, Standardization, PCA : Data preparation steps for segmentation in python

Getting the right data for the perfect segmentation! We will be going through all the steps necessary for transforming our raw dataset to the format we need for training and testing our segmentation algorithms.

How to do the feature selection in Machine Learning

How to do the feature selection in Machine Learning. An important question for your first data science-related job

Feature Selection to Kaggle Caravan Insurance Challenge on R

Feature Selection to Kaggle Caravan Insurance Challenge on R. This post will explains the feature selection to the Kaggle caravan insurance challenge before we feed the features into machine learning algorithms.

Costs prediction of a Marketing Campaign

A Data Science approach to predict the best candidates to be targeted for a marketing campaign. In this article, the focus is on the second section only, the Cleaning & Feature Selection.

Are you dropping too many correlated features?

An analysis of current methods and a proposed solution. In this article, I will be demonstrating the shortcomings of current methods and proposing a possible solution.

Demystifying Feature Engineering and Selection for Driver-Based Forecasting

In this article, we will explore the different types of features which are commonly engineered during forecasting projects and the rationale for using them.

Predicting Poetic Movements

In this article, I’ll explore some features of poetry that make it unique as a style of writing and investigate differences between four umbrella genres I’ll be referring to as “movements”.

What is Feature Engineering?

Feature Engineering is a process of extracting useful features from existing raw data using maths,statistics and domain knowledge. Feature Engineering is one of the most important steps to complete before starting a Machine Learning analysis.

Don’t Overfit

We will also see by how using the simple machine learning models like KNeighborsClassifier and LogisticRegression we can reduce overfitting and help our model generalize better on unseen data even with a less amount of training data that we have.

A Guide: Turning OpenStreetMap Location Data into ML Features

A Guide: Turning OpenStreetMap Location Data into ML Features. How to pull shops, restaurants, public transport modes and other local amenities into your ML models.

Ensemble Feature Selection in Machine Learning by OptimalFlow

Use OptimalFlow’s autoFS module to implement ensemble feature selection, which simplifies this process easily. Why we use OptimalFlow? You could read another story of its introduction: “An Omni-ensemble Automated Machine Learning — OptimalFlow”.

Handling Categorical Features using Encoding Techniques in Python

In this post we are going to discuss categorical features in machine learning and methods to handle these features using two of the most effective methods

Explained: Kaggle Housing Prices’ Feature Engineering and Ridge Regression.

This blog is based on the notebook I used to submit predictions for Kaggle In-Class Housing Prices Competition. My submission ranked 293 on the score board, although the focus of this blog is not how to get a high score but to help beginners develop intuition for Machine Learning regression techniques and feature engineering.

Comprehensive Guide to Machine Learning (Part 2 of 3)

Welcome to 2nd part of the “Comprehensive Guide to Machine Learning” series. In the first part of this series, we explored the below machine learning concepts.

Developing Trust in Machine Learning Models Predictions

Interpreting and Explaining the predictions made by Machine Learning Models using LIME. What if I tell you to invest $100,000 in a particular stock today as my machine learning model is predicting a high return.