# Easy to use Correlation Feature Selection with Kydavra

Almost every person in data science or Machine Learning knows that one of the easiest ways to find relevant features for predicted value y is to find the features that are most correlated with y. However few (if not a mathematician) know that there are many types of correlation. In this article,

Almost every person in data science or Machine Learning knows that one of the easiest ways to find relevant features for predicted value y is to find the features that are most correlated with y. However few (if not a mathematician) know that there are many types of correlation. In this article, I will shortly tell you about the 3 most popular types of Correlation and how you can easily apply them with Kydavra for feature selection.

Pearson correlation.

Pearson’s correlation coefficient in the covariance of two variables divided by the product of their standard deviations.

Figure 1. The formula to calculate the Pearson correlation between 2 features.

It’s valued between -1 and 1, negative values meaning inverse relation and positive, the reverse case. Often we just take the absolute value. So if the absolute value is above 0.5 the series can have (yes can have) a relation. However, we also set a vertical limit, 0.7 or 0.8, because if values are too correlated then possibly one series is derived from another (like age in months from age in years) or simply can drive our model to overfitting.

Using Kydavra PearsonCorrelationSelector.

Firstly you should install kydavra, if you don’t have it installed.

``pip install kydavra``

Next, we should create an abject and apply it to the Hearth Disease UCI dataset.

``````from kydavra import PearsonCorrelationSelector

selector = PearsonCorrelationSelector()
selected_cols = selector.select(df, ‘target’)``````

Applying the default setting of the selector on the Hearth Disease UCI Dataset will give us an empty list. This is because no feature has a correlation with the target feature higher than 0.5. That’s why we highly recommend you play around with parameters of the selector:

• *min_corr *(float, between 0 and 1, default=0.5) the minimal value of the correlation coefficient to be selected as an important feature.
• *max_corr *(float, between 0 and 1, default=0.5) the minimal value of the correlation coefficient to be selected as an important feature.
• *erase_corr *(boolean, default=False) if set to True then the algorithm will erase columns that are correlated between keeping just on, if False then it will keep all columns.

## AI(Artificial Intelligence): The Business Benefits of Machine Learning

Enroll now at CETPA, the best Institute in India for Artificial Intelligence Online Training Course and Certification for students & working professionals & avail 50% instant discount.

## Learning in Artificial Intelligence - Great Learning

What is Artificial Intelligence (AI)? AI is the ability of a machine to think like human, learn and perform tasks like a human. Know the future of AI, Examples of AI and who provides the course of Artificial Intelligence?

## Artificial Intelligence, Machine Learning, Deep Learning

Artificial Intelligence (AI) will and is currently taking over an important role in our lives — not necessarily through intelligent robots.

## How To Get Started With Machine Learning With The Right Mindset

You got intrigued by the machine learning world and wanted to get started as soon as possible, read all the articles, watched all the videos, but still isn’t sure about where to start, welcome to the club.

## Hire Machine Learning Developers in India

We supply you with world class machine learning experts / ML Developers with years of domain experience who can add more value to your business.