Machine learning algorithms may take a lot of time working with large datasets. To overcome this a new dimensional reduction technique was introduced. If the input dimension is high Principal Component Algorithm can be used to speed up our machines. It is a projection method while retaining the features of the original data.

In this article, we will discuss the basic understanding of Principal Component(PCA) on matrices with implementation in python. Further, we implement this technique by applying one of the classification techniques.

Dataset

The dataset can be downloaded from the following link. The dataset gives the details of breast cancer patients. It has 32 features with 569 rows.

Let’s get started.Import all the libraries required for this project.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline

Loading the dataset

dataset = pd.read_csv('cancerdataset.csv') 
dataset["diagnosis"]=dataset["diagnosis"].map({'M': 1, 'B': 0})
data=dataset.iloc[:,0:-1]
data.head()

#data-science

Principal Component Analysis On Matrix Using Python
1.20 GEEK