Common feature engineering/EDA tasks, compiled

The longest part of any data analysis/science task is preparing and configuring your data properly. A model only performs as well as the data that it is fed and there’s a lot of transformations that the data may have to undergo to be ready for model training. Over the years I have compiled a Notionpage that highlights many of the common tasks Data Scientists need to perform for data preparation. I’ve listed a few of the examples below, but the entirety of the examples can be found in the following link. I will continue to expand this link as I continue my learning journey with other common functions that I’ve used repeatedly during EDA or Feature Engineering.

Note: All these examples are in Python and mainly use the Pandas, Numpy, and Sci-Kit Learn libraries. For visualization MatPlotLib or Seaborn was used.

Table of Contents

  1. Checking for Missing Values in a DataFrame
  2. Dropping a Column
  3. Applying a Function to a Column
  4. Plot Value Counts of a Column
  5. Sort DataFrame by Column Values
  6. Dropping Rows based off a Column Value
  7. Ordinal Encoding
  8. Encoding DataFrame with all Categorical Variables
  9. Additional Resources

#data-analysis #data-science #machine-learning #pandas #notion #data preparation cheatsheet

Data Preparation Cheatsheet
1.15 GEEK