The longest part of any data analysis/science task is preparing and configuring your data properly. A model only performs as well as the data that it is fed and there’s a lot of transformations that the data may have to undergo to be ready for model training. Over the years I have compiled a Notionpage that highlights many of the common tasks Data Scientists need to perform for data preparation. I’ve listed a few of the examples below, but the entirety of the examples can be found in the following link. I will continue to expand this link as I continue my learning journey with other common functions that I’ve used repeatedly during EDA or Feature Engineering.
Note: All these examples are in Python and mainly use the Pandas, Numpy, and Sci-Kit Learn libraries. For visualization MatPlotLib or Seaborn was used.
#data-analysis #data-science #machine-learning #pandas #notion #data preparation cheatsheet