Data preparation is the process of cleaning and transforming raw data prior to processing and analysis. It is an important step prior to processing and often involves reformatting data, making corrections to data and the combining of data sets to enrich data.

Data preparation is often a lengthy undertaking for data professionals or business users, but it is essential as a prerequisite to put data in context in order to turn it into insights and eliminate bias resulting from poor data quality.

The data preparation process usually includes standardizing data formats, enriching source data, and/or removing outliers.

Data Preparation is the process of collecting, cleaning, and consolidating data into one file or data table, primarily for use in analysis.

Why Prepare Data?

There are several reasons why we need to prepare the data.

· By preparing data, we actually prepare the miner so that when using prepared data, the miner produces better models faster.

· Good data is essential for producing efficient models of any type.

· Data should be formatted according to required software tool.

· Data need to be made adequate for given method.

· Data in the real world is dirty.

Data Preparation in Data Science
