Once I was exposed to the judgmental looks of a friend since I knew all Starbucks stores in the city. Far from being a Starbucks fan, I had valid reasons to know a number of coffee shops including Starbucks -such as extreme unproductivity while working at home. Of course, these were not enough to convince my European friend and his dislike for big corporations.
Apart from that, I remember there was a time when Starbucks was a way to show status- at least where I came from. People were trying to show how cool they were, by either carrying a Starbucks cup or criticizing the one carries a Starbucks cup. I don’t know what is the new Starbucks is — yet this is not the topic here.
I played around some datasets seems so irrelevant that convinced me to somehow combine them. I combined those data, drew some fancy graphs and convincing regression lines, and finally enjoyed one of the** most common statistical mistakes. **Let’s go step by step.
2. Libraries and Importing the Data
library(ggthemes)
library(countrycode)
library(readr)
library(ggplot2)
library(dplyr)
library(tidyverse)
#Importing the data
starbucks <- read_csv("Medium/starbucks_locations.csv")
happiness <- read_csv("Medium/happiness_2019.csv")
3. Preprocessing
#Transforming country names to country codes and creating a continent column.
happiness<-happiness %>%
mutate(Country=countrycode(happiness$`Country or region`,origin="country.name", destination="iso2c" )) %>%
mutate(Continent=countrycode(happiness$`Country or region`, origin="country.name", destination="continent"))
#Aggregating the starbucks stores and summarizing each country by the number of starbucks stores
starbucks_total<-starbucks %>%
group_by(Country) %>%
summarize(Number_of_store=n())
#Normalization function for happiness score
normalize <- function(x) {
return ((x - min(x)) / (max(x) - min(x)))
}
#Combining starbucks and happiness dataset,
df<-happiness %>%
left_join(starbucks_total, by=c("Country"), suffix=c("_happiness","_data")) %>%
mutate(Number_of_store=replace_na(Number_of_store,0)) %>%
mutate(Normalized_score=normalize(Score))
#Selecting the relevant columns: Country, Country code, Continent, Number of store, Normalized score
starbucks_vs_happiness<-df[,c(2,10,11,12,13)]
starbucks_vs_happiness
#regression #data-science #happiness #starbucks #coffee #data analysis