This tutorial explains how to preprocess data using the Pandas library. Preprocessing is the process of doing a pre-analysis of data, in order to transform them into a standard and normalized format. Preprocessing involves the following aspects:

  • missing values
  • data formatting
  • data normalization
  • data standardization
  • data binning

In this tutorial we deal only with data formatting. In my previous tutorial I dealt with missing values.

Data formatting is the process of transforming data into a common format, which helps users to perform comparisons. An example of not formatted data is the following: the same entity is referred in the same column with different values, such as New York and NY.

You can download the source code of this tutorial as a Jupyter notebook from my Github Data Science Repository.

#pandas #python #pandas-dataframe #data-science #data-preprocessing

Data Processing with Python Pandas — Part 2 Data Formatting
2.05 GEEK