Feature Selection to Kaggle Caravan Insurance Challenge on R

Recapping from the previous post, this post will explains the feature selection to the Kaggle caravan insurance challenge before we feed the features into machine learning algorithms (probably the next post), which aims to identify those customers who are most likely to purchase caravan policies based on 85 historic socio-demographic and product-ownership data attributes.

Nominal data attributes

Out of the 85 historic data attributes, they consist of nominal and ordinal attributes. Nominal attributes are data in the forms of names but there’s no clear order between them. Some examples are Male vs Female, Blood type O vs Blood Type A vs Blood Type B or even zipcode. Ordinal attributes are data with a clear distinct orders between each level of the data attribute like cholesterol level low vs medium vs high or happiness level on a scale of 1 to 10. As a result, we have to convert the nominal attributes to factor form before we can begin feeding our training data into any ML algorithms.

#data-science #feature-selection #feature-engineering #r #data-analysis

Nominal data attributes

medium.com

Feature Selection to Kaggle Caravan Insurance Challenge on R