This tutorial explains how to preprocess data using the Pandas library. Preprocessing is the process of doing a pre-analysis of data, in order to transform them into a standard and normalized format. Preprocessing involves the following aspects:
In this tutorial we deal only with data formatting. In my previous tutorial I dealt with missing values.
Data formatting is the process of transforming data into a common format, which helps users to perform comparisons. An example of not formatted data is the following: the same entity is referred in the same column with different values, such as New York and NY.
You can download the source code of this tutorial as a Jupyter notebook from my Github Data Science Repository.
#pandas #python #pandas-dataframe #data-science #data-preprocessing