Hierarchical Clustering of Countries Based on Eurovision Votes

Description

This dataset contains the votes From Country to To Country for Eurovision 2016. There are the Jury Votes and the Televote. We would like to see how people voted in Eurovision 2016 and for that reason, we will consider only the Televote. Our ultimate goal is to create a dendrogram that will show the relationship between countries. The algorithm will be the Hierarchical Clustering.

Data Processing

We will load the data and we will keep only three columns such as the From Country, To Country and the Televote Rank. Then we will reshape the data where the rows will be the From Country ,the columns will the To Country and the values will be the Televote Rank. Notice that each country cannot vote itself and for that reason will be NA values. We will impute the NAs with the Televote Rank=1 assuming that each country would have given the highest score to itself if that was allowed. Bear in mind that we want to cluster the countries based on their vote preferences.

from scipy.cluster.hierarchy import linkage, dendrogram
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.cluster.vq import whiten
%matplotlib inline
eurovision = pd.read_csv("eurovision-2016.csv")
televote_Rank = eurovision.pivot(index='From country', columns='To country', values='Televote Rank')
## fill NAs with 1
televote_Rank.fillna(1, inplace=True)

#data-science #clustering #unsupervised-learning #python

Description

Data Processing

medium.com

Hierarchical Clustering of Countries Based on Eurovision Votes