Contact tracing is the name of the process used to identify those who come into contact with people who have tested positive for contagious diseases — such as measles, HIV, and COVID-19. During a pandemic, performing contact tracing correctly can help reduce the number of people to get infected or speed up the process of treating infected people. Doing so can help save many lives.

Technology can help automate the process of contact tracing, producing more efficient and accurate results than if the procedure was performed manually. One technology that can help this process is Machine Learning. More precisely, clustering. **Clustering **is a subclass of Machine Learning algorithms used to divide data that share some characteristics in different clusters based on these characteristics.

5 Data Mining Techniques Every Data Scientist Should Know

How to find patterns in the chaos?

towardsdatascience.com

There are various types of clustering algorithms, such as K-means, Mean-Shift, Spectral Clustering, BIRCH, DBSCAN, and so much more. These different algorithms can be divided into **_three _**categories:

  1. Density-based Clustering: Clusters are formed based on the density of the region — examples of this type: _DBSCAN _(Density-Based Spatial Clustering of Applications with Noise) and _OPTICS _(Ordering Points to Identify Clustering Structure).
  2. Hierarchal-based Clustering: Clusters are formed using a tree-type structure. Some clusters are predefined and then used to create new clusters — examples of this type: _CURE _(Clustering Using Representatives), _BIRCH _(Balanced Iterative Reducing Clustering, and using Hierarchies).
  3. Partitioning-based Clustering: Clusters are formed by partitioning the input data into k clusters — examples of this type: K-means, **CLARANS **(Clustering Large Applications based upon Randomized Search).

For contact tracing, we need to use a density-based clustering algorithm. The reason is, diseases are transferred when an infected person comes in contact with others. So, more crowded — dense — areas will have more cases than less crowded ones.

To trace the movement of infected people, scientists often use GPS datasets that contain information about the time and location of a person in any given timeframe. The location data is often represented as longitude and latitude coordinates.

#programming #computer-science #machine-learning #data-science #mathematics

Contact Tracing Using Less Than 30 Lines of Python Code
1.30 GEEK