Authorities in New York spent some of their time and a lot public money in collecting data about squirrels. This dataset is called the NYC Squirrel Census data.
Dataset
The dataset contains information about individual squirrels. Some of the columns are
Age category
Squirrel ID
Activities it does
Sounds it makes
Skin Color
Accessing the data
data_url <- ‘https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-29/nyc_squirrels.csv’
d_raw <- read_csv(data_url)
head(d_raw)
Image for post
Data
Cleaning and filtering
Let’s remove some of the columns and analyse the rest of it.
%>% is the pipe symbol and is described in detail here
Image for post
Plotting
Now let’s fetch some information for plotting.
d_activity <- data %>%
# select only identifiers and activities column
select(c(identifiers, activities)) %>%
select(-c(other_activities)) %>%
pivot_longer(-identifiers, names_to = ‘Activity’, values_to = ‘Value’) %>%
# remove all rows with false value
filter(Value == T)
This new dataframe looks cleaner.
Image for post
Activities By Time
Let’s see how the squirrels spend their time during the day and during the evening.
p <- ggplot(data = d_activity, aes(x=Activity, fill=shift)) +
geom_bar(position = ‘dodge’)
print§
Image for post
We can clearly see that there’s a greater deal of foraging and eating in the evening as compared to other activities which are more equally spread out over the day.
Categorising by Age
Let’s see how the squirrels are distributed by age.
p<- ggplot(data = data, aes(x=age, fill=age)) +
geom_bar(width=1)
print§
Image for post
We can see that there are some entries that are NA and some that are labelled as a question-mark.
Sounds Analysis
Let’s analyse the sounds that these cute little jumpy creatures make. Note the use of pivoting (pivot_longer) in the code below. Pivoting is explained in detail here.
sound_activity <- data %>%
select(c(unique_squirrel_id, age,sounds)) %>%
pivot_longer(-c(unique_squirrel_id, age), names_to = ‘Sound’,
values_to = ‘Value’) %>%
drop_na(age) %>%
filter(Value== TRUE)
The resulting dataframe looks like this
Image for post
Let’s plot it and see for ourselves.
p <- ggplot(sound_activity, aes(x = age, fill= Sound)) +
geom_bar(stat = “count”)
print§
Image for post
We can see that a huge number of adult squirrels make the kuks noise. However, in the raw data the total number of adult squirrels is much higher than the juvenile ones. This data does not show a fair comparison of the percentage of adults making these sounds.
#programming #jobs #data-science #data-visualization #coding