In India, diabetes is a major issue. Between 1971 and 2000, the incidence of diabetes rose ten times, from 1.2% to 12.1%. 61.3 million people 20–79 years of age in India are estimated living with diabetes (Expectations of 2011). It is expected that by 2030 this number will rise to 101,2 million. In India there are reportedly 77.2 million people with prediabetes. In 2012, nearly 1 million people in India died of diabetes. 1 out of 4 individuals living in Chennai’s urban slums suffer from diabetes, which is about 7 per cent by three times the national average. One third of the deaths in India involve people under non-communicable diseases Sixty years old. Indians get diabetes 10 years before their Western counterparts on average. Changes in lifestyle lead to physical decreases Increased fat, sugar and activities activity calories and higher insulin cortisol levels Obesity and vulnerability. In 2011, India cost around $38 billion annually as a result of diabetes.


Pima Indians Diabetes Database (Predict the onset of diabetes based on diagnostic measures)

Dataset Source:

UCI Machine Learning — Repository:


This dataset comes from the Diabetes and Digestive and Kidney Disease National Institutes. The purpose of this dataset is to diagnose whether or not a patient is diabetes, on the basis of certain diagnostic measures in the dataset. The selection of these instances from a larger database was subject to several restrictions. All patients are women from the Indian heritage of Pima, at least 21 years old.


The data sets comprise several variables of the medical predictor, and one objective variable, Outcome. The forecasting variables include the patient’s number of pregnancies, BMI levels, insulin levels, age, etc.


Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitusIn Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261–265). IEEE Computer Society Press.


Can you build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?

1 Imports and Loading Dataset


import numpy as np
import pandas as pd

# Visualization imports
import matplotlib.pyplot as plt
import seaborn as sns
# plotly import for Colab
def configure_plotly_browser_state():
  import IPython
        <script src="/static/components/requirejs/require.js"></script>
            paths: {
              base: '/static/base',
              plotly: '',

# plotly import
import as px
from plotly import __version__
import cufflinks as cf
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
import IPython
IPython.get_ipython().events.register('pre_run_cell', configure_plotly_browser_state)


# Loading Dataset
df = pd.read_csv('/content/drive/My Drive/dataset/knn/datasets_228_482_diabetes.csv')

Image for post

#data-science #knn #machine-learning #data analysis

Pima Indians Diabetes - Prediction & KNN Visualization
9.95 GEEK