Introduction

Associative analysis also knows as the **market basket analysis **is one key technique used to uncover associations between items, initially used by large supermarkets and retailers. It analyses the combinations of items that occur together, and looks for the frequency of these transactions. Thus helping to understand the relationships between the items that people buy. The applications are many, placement of products on aisles, recommending items in ecommerce websites and songs recommended in Spotify.

With the current COVID — 19 out break , many data sets have been made public for the usage of researchers. I came across one such data set published by wolfram [1]. The data set had some details regarding the symptoms which the patients were having, and I decided to dig a bit deeper into this symptoms.

This article will discuss the insights of the data, as well as the approach of how to do it.

Approach

In this particular data set , there are altogether 13179 patient data, but majority of the columns are sparse. Since we are only focusing about the symptoms of COVID-19 , from the entire data set only 1631 patient symptom data is available. One might argue the amount of information is low, but lets be optimistic shall we ?

After loading the data, there is a necessity to clean and to do format transformation. If we take a closer look at the symptom data, the image below shows the format of the symptom data.

Next with the help of regex library , the symptoms needs to be extracted. The code segment below would be helpful for the extraction

The next step towards association analysis is to do a one hot encoding for the extracted data. In this particular dataset we have altogether 95 unique symptoms. The image below describes all 95 unique symptoms.

One might prefer a library to do this encoding, but I preferred to write a code from scratch.

Now we have prepared our data for associate analysis.

#data-science #covid19 #data-mining #inside-ai #ds-in-the-real-world #data analysis

Associative analysis and COVID — 19 symptoms
2.85 GEEK