Data Preparation Cheatsheet

Common feature engineering/EDA tasks, compiled

The longest part of any data analysis/science task is preparing and configuring your data properly. A model only performs as well as the data that it is fed and there’s a lot of transformations that the data may have to undergo to be ready for model training. Over the years I have compiled a Notionpage that highlights many of the common tasks Data Scientists need to perform for data preparation. I’ve listed a few of the examples below, but the entirety of the examples can be found in the following link. I will continue to expand this link as I continue my learning journey with other common functions that I’ve used repeatedly during EDA or Feature Engineering.

Note: All these examples are in Python and mainly use the Pandas, Numpy, and Sci-Kit Learn libraries. For visualization MatPlotLib or Seaborn was used.

Checking for Missing Values in a DataFrame
Dropping a Column
Applying a Function to a Column
Plot Value Counts of a Column
Sort DataFrame by Column Values
Dropping Rows based off a Column Value
Ordinal Encoding
Encoding DataFrame with all Categorical Variables
Additional Resources

#data-analysis #data-science #machine-learning #pandas #notion #data preparation cheatsheet

Common feature engineering/EDA tasks, compiled

Table of Contents

towardsdatascience.com

Data Preparation Cheatsheet