In this tutorial, we’re going to create a model to predict whether a patient has a positive breast cancer diagnosis based on several tumor features.

Problem Statement

The breast cancer database is a publicly available dataset from the UCI Machine learning Repository. It gives information on tumor features such as tumor size, density, and texture.

**Goal: **To create a classification model that looks at predicts if the cancer diagnosis is benign or malignant based on several features.

Data used: Kaggle-Breast Cancer Prediction Dataset


Step 1: Exploring the Dataset

First, let’s understand our dataset:

#import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
#import models from scikit learn module:
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.svm import SVC
#import Data
df_cancer = pd.read_csv('Breast_cancer_data.csv')
df_cancer.head()
#get some information about our Data-Set
df_cancer.info()
df_cancer.describe()
#visualizing data
sns.pairplot(df_cancer, hue = 'diagnosis')
plt.figure(figsize=(7,7))
sns.heatmap(df_cancer['mean_radius mean_texture mean_perimeter mean_area mean_smoothness diagnosis'.split()].corr(), annot=True)
sns.scatterplot(x = 'mean_texture', y = 'mean_perimeter', hue = 'diagnosis', data = df_cancer)

#data-science #machine-learning #support-vector-machine #python #kaggle

Case Study: Breast Cancer Classification Using a Support Vector Machine
2.20 GEEK