The algorithm I choose to implement for this project was K-means clustering. The data generated attempted to model a water distribution scenario based on the distance each point is from a proposed water-well. Essentially, the algorithm is ideal for a customer segmentation scenario that clusters customers around a particular water-well based on their location and _k _numbers of wells.

In this imaginary scenario, several local municipalities have struggled for years with its aging water infrastructure. Supply, storage, movement of water from one point to another are a challenge. A data scientist was called in to build some models about how customers will be affected and to shed some light on improving water delivery supply.

The goal was to understand the algorithm; however, when variables have real meaning it helps us to get under the hood to see how it works and the math behind it.

Example Problem: How will customers cluster with respect to their location? Create a visual model for the placement of 3 and 7 water-wells, each well serving as the centroid.

The location of customers were fixed equidistance points, generated using the built-in range function. There were a total of 1392 points.

Image for post

Image by Author

After plotting the fixed location of each customer, I found what I considered to be the geographical center of the dataset. This I assumed would be the ideal place for the first water-well.

#water #data-science #python #lambda-school #k-means-clustering

K-means Clustering Algorithm
2.35 GEEK