The next step after exploring the patterns in data is feature engineering. Any operation performed on the features/columns which could help us in making a prediction from the data could be termed as Feature Engineering. This would include the following at high-level:

  1. adding new features
  2. eliminating some of the features which tell the same story
  3. combining several features together
  4. breaking down a feature into multiple features

Adding new features

Suppose you want to predict sales of ice-cream or gloves, or umbrella. What is common in these items? The sales of all these items are dependent on “weather” and “location”. Ice-creams sell more during summer or hotter areas, gloves are sold more in colder weather (winter) or colder regions, and we definitely need an umbrella when there’s rain. So if you have the historical sales data for all these items, what would help your model to learn the patterns more would be to add the weather and the selling areas at each data level.

Eliminating some of the features which tell the same story

For explanation purpose, I made up a sample dataset which has data of different phone brands, something like the one below. Let us analyze this data and figure out why we should remove/eliminate some columns-

Image for post

Image by Author

  1. Now in this dataset, if we look carefully, there is a column for the brand name, a column for the model name, and there’s another column which says Phone (which basically contains both brand and model name). So if we see this situation, we don’t need the column Phone because the data in this column is already present in other columns, and split data is better than the aggregated data in this case.
  2. There is another column that is not adding any value to the dataset — Memory scale. All the memory values are in terms of “GB”, hence there is no need to keep an additional column that fails to show any variation in the dataset, because it’s not going to help our model learn different patterns.

#data-preprocessing #artificial-intelligence #data-science #machine-learning #feature-engineering #data analysis

What to Keep and What to Remove
1.10 GEEK