In India, diabetes is a major issue. Between 1971 and 2000, the incidence of diabetes rose ten times, from 1.2% to 12.1%. 61.3 million people 20–79 years of age in India are estimated living with diabetes (Expectations of 2011). It is expected that by 2030 this number will rise to 101,2 million. In India there are reportedly 77.2 million people with prediabetes. In 2012, nearly 1 million people in India died of diabetes. 1 out of 4 individuals living in Chennai’s urban slums suffer from diabetes, which is about 7 per cent by three times the national average. One third of the deaths in India involve people under non-communicable diseases Sixty years old. Indians get diabetes 10 years before their Western counterparts on average. Changes in lifestyle lead to physical decreases Increased fat, sugar and activities activity calories and higher insulin cortisol levels Obesity and vulnerability. In 2011, India cost around $38 billion annually as a result of diabetes.

Dataset:

Pima Indians Diabetes Database (Predict the onset of diabetes based on diagnostic measures)

Dataset Source:

UCI Machine Learning — Repository:https://www.kaggle.com/uciml/pima-indians-diabetes-database

Context

This dataset comes from the Diabetes and Digestive and Kidney Disease National Institutes. The purpose of this dataset is to diagnose whether or not a patient is diabetes, on the basis of certain diagnostic measures in the dataset. The selection of these instances from a larger database was subject to several restrictions. All patients are women from the Indian heritage of Pima, at least 21 years old.

Content

The data sets comprise several variables of the medical predictor, and one objective variable, Outcome. The forecasting variables include the patient’s number of pregnancies, BMI levels, insulin levels, age, etc.

Acknowledgements

Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261–265). IEEE Computer Society Press.

Inspiration

Can you build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?

1 Imports and Loading Dataset

Imports

import numpy as np
import pandas as pd

# Visualization imports
import matplotlib.pyplot as plt
import seaborn as sns
# plotly import for Colab
def configure_plotly_browser_state():
  import IPython
  display(IPython.core.display.HTML('''
        <script src="/static/components/requirejs/require.js"></script>
        <script>
          requirejs.config({
            paths: {
              base: '/static/base',
              plotly: 'https://cdn.plot.ly/plotly-latest.min.js?noext',
            },
          });
        </script>
        '''))

# plotly import
import plotly.express as px
from plotly import __version__
import cufflinks as cf
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
init_notebook_mode(connected=True)
cf.go_offline()
import IPython
IPython.get_ipython().events.register('pre_run_cell', configure_plotly_browser_state)

Dataset

# Loading Dataset
df = pd.read_csv('/content/drive/My Drive/dataset/knn/datasets_228_482_diabetes.csv')
df.head()

Image for post

#data-science #knn #machine-learning #data analysis