In this analysis i’ll build a model that will predict whether a tumor is malignant or benign, based on data from a study on breast cancer. Classification algorithms will be used in the modelling process.

The dataset

**The data for this analysis refer to 569 patients from a study on breast cancer. The actual data can be found at UCI (Machine Learning Repository): **https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic). The variables were computed from a digitized image of a breast mass and describe characteristics of the cell nucleus present in the image. In particular the variables are the following:

  1. **radius **(mean of distances from center to points on the perimeter)
  2. **texture **(standard deviation of gray-scale values)
  3. perimeter
  4. area
  5. **smoothness **(local variation in radius lengths)
  6. **compactness **(perimeter^² / area — 1.0)
  7. **concavity **(severity of concave portions of the contour)
  8. **concave points **(number of concave portions of the contour)
  9. symmetry
  10. fractal dimension (“coastline approximation” — 1)
  11. **type **(tumor can be either malignant -M- or benign -B-)

#predictive-analytics #logistic-regression #machine-learning #classification #decision-tree-classifier

Predict using classification methods in R
1.15 GEEK