Authorities in New York spent some of their time and a lot public money in collecting data about squirrels. This dataset is called the NYC Squirrel Census data.

Dataset

The dataset contains information about individual squirrels. Some of the columns are

Age category

Squirrel ID

Activities it does

Sounds it makes

Skin Color

Accessing the data

data_url <- ‘https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-10-29/nyc_squirrels.csv

d_raw <- read_csv(data_url)

head(d_raw)

Image for post

Data

Cleaning and filtering

Let’s remove some of the columns and analyse the rest of it.

%>% is the pipe symbol and is described in detail here

Image for post

Plotting

Now let’s fetch some information for plotting.

d_activity <- data %>%

# select only identifiers and activities column

select(c(identifiers, activities)) %>%

select(-c(other_activities)) %>%

pivot_longer(-identifiers, names_to = ‘Activity’, values_to = ‘Value’) %>%

# remove all rows with false value

filter(Value == T)

This new dataframe looks cleaner.

Image for post

Activities By Time

Let’s see how the squirrels spend their time during the day and during the evening.

p <- ggplot(data = d_activity, aes(x=Activity, fill=shift)) +

geom_bar(position = ‘dodge’)

print§

Image for post

We can clearly see that there’s a greater deal of foraging and eating in the evening as compared to other activities which are more equally spread out over the day.

Categorising by Age

Let’s see how the squirrels are distributed by age.

p<- ggplot(data = data, aes(x=age, fill=age)) +

geom_bar(width=1)

print§

Image for post

We can see that there are some entries that are NA and some that are labelled as a question-mark.

Sounds Analysis

Let’s analyse the sounds that these cute little jumpy creatures make. Note the use of pivoting (pivot_longer) in the code below. Pivoting is explained in detail here.

Analysis for the squirrel sounds and human interractions

sound_activity <- data %>%

select(c(unique_squirrel_id, age,sounds)) %>%

pivot_longer(-c(unique_squirrel_id, age), names_to = ‘Sound’,

values_to = ‘Value’) %>%

drop_na(age) %>%

filter(Value== TRUE)

The resulting dataframe looks like this

Image for post

Let’s plot it and see for ourselves.

p <- ggplot(sound_activity, aes(x = age, fill= Sound)) +

geom_bar(stat = “count”)

print§

Image for post

We can see that a huge number of adult squirrels make the kuks noise. However, in the raw data the total number of adult squirrels is much higher than the juvenile ones. This data does not show a fair comparison of the percentage of adults making these sounds.

#programming #jobs #data-science #data-visualization #coding

Having Fun Plotting Data Squirrels in R
2.60 GEEK