Once I was exposed to the judgmental looks of a friend since I knew all Starbucks stores in the city. Far from being a Starbucks fan, I had valid reasons to know a number of coffee shops including Starbucks -such as extreme unproductivity while working at home. Of course, these were not enough to convince my European friend and his dislike for big corporations.

Apart from that, I remember there was a time when Starbucks was a way to show status- at least where I came from. People were trying to show how cool they were, by either carrying a Starbucks cup or criticizing the one carries a Starbucks cup. I don’t know what is the new Starbucks is — yet this is not the topic here.

I played around some datasets seems so irrelevant that convinced me to somehow combine them. I combined those data, drew some fancy graphs and convincing regression lines, and finally enjoyed one of the** most common statistical mistakes. **Let’s go step by step.

  1. Datasets are from Kaggle:

2. Libraries and Importing the Data

library(ggthemes)
library(countrycode)
library(readr)
library(ggplot2)
library(dplyr)
library(tidyverse)
#Importing the data
starbucks <- read_csv("Medium/starbucks_locations.csv")
happiness <- read_csv("Medium/happiness_2019.csv")

3. Preprocessing

#Transforming country names to country codes and creating a continent column. 
happiness<-happiness %>%
  mutate(Country=countrycode(happiness$`Country or region`,origin="country.name", destination="iso2c" )) %>%
  mutate(Continent=countrycode(happiness$`Country or region`, origin="country.name", destination="continent"))
#Aggregating the starbucks stores and summarizing each country by the number of starbucks stores
starbucks_total<-starbucks %>%
  group_by(Country) %>%
  summarize(Number_of_store=n())
#Normalization function for happiness score
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
#Combining starbucks and happiness dataset, 
df<-happiness %>% 
  left_join(starbucks_total, by=c("Country"), suffix=c("_happiness","_data")) %>%
  mutate(Number_of_store=replace_na(Number_of_store,0)) %>%
  mutate(Normalized_score=normalize(Score))
#Selecting the relevant columns: Country, Country code, Continent, Number of store, Normalized score
starbucks_vs_happiness<-df[,c(2,10,11,12,13)]
starbucks_vs_happiness

#regression #data-science #happiness #starbucks #coffee #data analysis

Happy cities have more Starbucks: or do they?
1.20 GEEK