We are currently in the seventh month of a global pandemic. I think I can safely say no one is enjoying it. The quicker this ends the better. One of the most important tools the medical field has when it comes to mitigating the spread of a virus is vaccination. However, vaccinations don’t work if no one gets vaccinated. Now, I realize that sounds like an obvious statement, and in many ways it is. However, it is nonetheless a hugely important fact. This is the underlying idea of “herd immunity”, and it is vital to be able to identify members of the community who are unlikely to get vaccinated.

Fortunately for us aspiring data scientists this is not our first pandemic. In response to the H1N1 Flu in 2009, the Centers for Disease Control and Prevention (CDC) conducted a survey “in order to monitor and evaluate flu vaccination efforts among adults and children”. This phone survey asked people whether they had received H1N1 and seasonal flu vaccines, in conjunction with information they shared about their lives, opinions, and behaviors. DrivenData provided a large chunk of this dataset and posed the question: Using the survey results can you make a model that predicts who will get either vaccine? I figured I would give it a shot (pun intended).

For my last project, I created a linear regression model to predict a baseball team’s total wins. The target there (wins) is a _continuous value _and so it was perfectly suited for a regression model. This problem is not so simple (or perhaps it’s more simple?). Here our target value is binary, whether or not a participant got the vaccine. So for this problem, we will be looking at classification models rather than regression models. These are models that predict the likelihood of one result or the other, rather than trying to predict a continuous variable.

In this post I will walk through the process I took to build my model as well as explain some of the different classification models I ended up not using.

#data-science #machine-learning #python #pandas #classification

Using Classification Models To Predict Vaccinations
1.20 GEEK