Data Science Capstone Project Using Python

Problem Statement

  • NIDDK (National Institute of Diabetes and Digestive and Kidney Diseases) research creates knowledge
    about and treatments for the most chronic, costly, and consequential diseases.

  • The dataset used in this project is originally from NIDDK. The objective is to predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.

  • Build a model to accurately predict whether the patients in the dataset have diabetes or not.

Approach :

  • Try and explore the data to check for missing values/erroneous entries and also comment on redundant
    features and add additional ones, if needed.

  • It is immediately apparent that some of the column names have typos, so let us clear them up before
    continuing further, so that we don’t have to use alternate spellings every time we need a variable.

  • For convenience, convert the AppointmentRegistration and Appointment columns into datetime64 format
    and the AwaitingTime column into absolute values.

  • Create a new feature called HourOfTheDay, which will indicate the hour of the day at which the
    appointment was booked.

  • Identify and remove outliers from Age. Explain using an appropriate plot.

  • Analyse the probability of showing up with respect to different features. Create scatter plot and
    trend lines to analyse the relation between probability of showing up with respect to
    age/Houroftheday/awaitingtime. Describe your finding.

  • Create a bar graph to depict probability of showing up for diabetes, alcoholism, hypertension, TB,
    smokes, scholarship.

  • Create separate bar graphs to show the probability of showing up for male and female, day of the week
    and sms reminder. Describe your interpretation.

  • Predict the Show-Up/No-Show status based on the features which show the most variation in probability
    of showing up. They are:

Age, Diabetes, Alchoholism, Hypertension, Smokes, Scholarship, Tuberculosis

Click Here For Project

#python #data science #data science capstone #capstone #simplilearn

Data Science Capstone Project Using Python
12.40 GEEK