My dataset consists of scores and total respondents for questions asked in a survey, over a number of fiscal years (FY13, FY14 & FY15) and in different regions.
My objective is to loop through the FY column and identify when each question was asked, for each region. And store this information in a new column.
This is what a reproducible sample looks like -
testdf=data.frame(FY=c("FY13","FY14","FY15","FY14","FY15","FY13","FY14","FY15","FY13","FY15","FY13","FY14","FY15","FY13","FY14","FY15"), Region=c(rep("AFRICA",5),rep("ASIA",5),rep("AMERICA",6)), QST=c(rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",3)), Very.Satisfied=runif(16,min = 0, max=1), Total.Very.Satisfied=floor(runif(16,min=10,max=120)), Satisfied=runif(16,min = 0, max=1), Total.Satisfied=floor(runif(16,min=10,max=120)), Dissatisfied=runif(16,min = 0, max=1), Total.Dissatisfied=floor(runif(16,min=10,max=120)), Very.Dissatisfied=runif(16,min = 0, max=1), Total.Very.Dissatisfied=floor(runif(16,min=10,max=120)))
I start with creating an ID column, by concatenating Region & QST
library(tidyr) testdf = testdf %>% unite(ID,c('Region','QST'),sep = "",remove = F)
My Objective
1) For each unique ID, identify whether the given question was asked -
a) Only on one year (either FY13, FY14 or FY15)
b) Over the Past Two Years (FY15 & FY14 only)
c) Over the Past Three Years (FY15 & FY14 & FY13)
d) On FY13 & FY15 Only
My Attempt
For this problem, I tried to create a for loop, and for each unique ID, I first store the unique occurences of each FY the question was asked in a vector v. Then using an IF conditional statement I assign a comment to a newly created column called Tally based on these occurences.
for (i in unique(testdf$ID)) { v=unique(testdf$FY)if((‘FY15’ %in% v) & (‘FY14’ %in% v)) {
testdf$Tally==‘Asked Over The Past Two Years’
else if((‘FY15’ %in% v) & (‘FY14’ %in% v) & (‘FY13’ %in% v)) {
testdf$Tally==‘Asked Over The Past Three Years’
else if((‘FY13’ %in% v) & (‘FY15’ %in% v)) {
testdf$Tally==‘Question Asked in FY13 & FY15 Only’
else { testdf$Tally==‘Question Asked Once Only’
The loop seems to run without throwing an error message, but it doesn’t seem to create the new Tally column.
Any help with this will be greatly appreciated.
#r #loops