My dataset consists of scores and total respondents for questions asked in a survey, over a number of fiscal years (FY13, FY14 & FY15) and in different regions.
My objective is to loop through the FY column and identify when each question was asked, for each region. And store this information in a new column.
This is what a reproducible sample looks like -
testdf=data.frame(FY=c("FY13","FY14","FY15","FY14","FY15","FY13","FY14","FY15","FY13","FY15","FY13","FY14","FY15","FY13","FY14","FY15"), Region=c(rep("AFRICA",5),rep("ASIA",5),rep("AMERICA",6)), QST=c(rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",3)), Very.Satisfied=runif(16,min = 0, max=1), Total.Very.Satisfied=floor(runif(16,min=10,max=120)), Satisfied=runif(16,min = 0, max=1), Total.Satisfied=floor(runif(16,min=10,max=120)), Dissatisfied=runif(16,min = 0, max=1), Total.Dissatisfied=floor(runif(16,min=10,max=120)), Very.Dissatisfied=runif(16,min = 0, max=1), Total.Very.Dissatisfied=floor(runif(16,min=10,max=120)))
I start with creating an ID column, by concatenating Region & QST
library(tidyr) testdf = testdf %>% unite(ID,c('Region','QST'),sep = "",remove = F)
My Objective
1) For each unique ID, identify whether the given question was asked -
a) Only on one year (either FY13, FY14 or FY15)
b) Over the Past Two Years (FY15 & FY14 only)
c) Over the Past Three Years (FY15 & FY14 & FY13)
d) On FY13 & FY15 Only
My Attempt
For this problem, I tried to create a for loop, and for each unique ID, I first store the unique occurences of each FY the question was asked in a vector v. Then using an IF conditional statement I assign a comment to a newly created column called Tally based on these occurences.
for (i in unique(testdf$ID)) { v=unique(testdf$FY)if((‘FY15’ %in% v) & (‘FY14’ %in% v)) {
testdf$Tally==‘Asked Over The Past Two Years’
}
else if((‘FY15’ %in% v) & (‘FY14’ %in% v) & (‘FY13’ %in% v)) {
testdf$Tally==‘Asked Over The Past Three Years’
}
else if((‘FY13’ %in% v) & (‘FY15’ %in% v)) {
testdf$Tally==‘Question Asked in FY13 & FY15 Only’
}
else { testdf$Tally==‘Question Asked Once Only’
}}
The loop seems to run without throwing an error message, but it doesn’t seem to create the new Tally column.
Any help with this will be greatly appreciated.
#r #loops