My data
dsL<-readRDS("./Data/Derived/dsL.rds")
# color palette for the outcome
attcol8<-c("Never"="#4575b4",
"Once or Twice"="#74add1",
"Less than once/month"="#abd9e9",
"About once/month"="#e0f3f8",
"About twice/month"="#fee090",
"About once/week"="#fdae61",
"Several times/week"="#f46d43",
"Everyday"="#d73027")
# view for one respondent
print (dsL[dsL$id==1,c("id","year","attend","attendF")])
id year attend attendF
1 1 1997 NA <NA>
2 1 1998 NA <NA>
3 1 1999 NA <NA>
4 1 2000 1 Never
5 1 2001 6 About once/week
6 1 2002 2 Once or Twice
7 1 2003 1 Never
8 1 2004 1 Never
9 1 2005 1 Never
10 1 2006 1 Never
11 1 2007 1 Never
12 1 2008 1 Never
13 1 2009 1 Never
14 1 2010 1 Never
15 1 2011 1 Never
Creating frequency distributions for each of the measurement wave we have:
ds<- dsL
p<-ggplot(ds, aes(x=yearF, fill=attendF))
p<-p+ geom_bar(position="fill")
p<-p+ scale_fill_manual(values = attcol8,
name="Response category" )
p<-p+ scale_y_continuous("Prevalence: proportion of total",
limits=c(0, 1),
breaks=c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1))
p<-p+ scale_x_discrete("Waves of measurement",
limits=as.character(c(2000:2011)))
p<-p+ labs(title=paste0("In the past year, how often have you attended a worship service?"))
p
Missing values are used in the calculation of total responses to show the natural attrition in the study. Assumming that attrition is not significantly associated with the outcome measure, we can remove missing values from the calculation of the total of responses and look at percentages that each response was endorsed at each time point.
what can be done produce the graph i just described? and do it most efficiently? I tried na.rm=TRUE in various places, but it didn't go the trick. Any ideas?
ds<- dsL
### ???
p<-ggplot(ds, aes(x=yearF, fill=attendF))
p<-p+ geom_bar(position="fill")
p<-p+ scale_fill_manual(values = attcol8,
name="Response category" )
p<-p+ scale_y_continuous("Prevalence: proportion of total",
limits=c(0, 1),
breaks=c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1))
p<-p+ scale_x_discrete("Waves of measurement",
limits=as.character(c(2000:2011)))
p<-p+ labs(title=paste0("In the past year, how often have you attended a worship service?"))
#p
After @MrFlick solution:
ds<- dsL
p<-ggplot(subset(ds, !is.na(attendF)), aes(x=yearF, fill=attendF))
p<-p+ geom_bar(position="fill")
p<-p+ scale_fill_manual(values = attcol8,
name="Response category" )
p<-p+ scale_y_continuous("Prevalence: proportion of total",
limits=c(0, 1),
breaks=c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1))
p<-p+ scale_x_discrete("Waves of measurement",
limits=as.character(c(2000:2011)))
p<-p+ labs(title=paste0("In the past year, how often have you attended a worship service?"))
#p
The easiest place to drop them is when you set the data set for the plot
p <- ggplot(subset(ds, !is.na(attendF)), aes(x=yearF, fill=attendF))
Here i've created some sample data (which would have been helpful in the initial question) and re-ran your plotting commands after subsetting
ds<-data.frame(
id=rep(1:100, each=4),
yearF=factor(rep(2001:2004, 100)),
attendF=sample(1:8, 400, T, c(.2,.2,.15,.10,.10, .20, .15, .02))
)
ds[sample(which(ds$year==2002), 5), "attendF"]<-NA
ds[sample(which(ds$year==2003), 15), "attendF"]<-NA
ds[sample(which(ds$year==2004), 40), "attendF"]<-NA
attcol8<-c("Never"="#4575b4",
"Once or Twice"="#74add1",
"Less than once/month"="#abd9e9",
"About once/month"="#e0f3f8",
"About twice/month"="#fee090",
"About once/week"="#fdae61",
"Several times/week"="#f46d43",
"Everyday"="#d73027")
ds$attendF<-factor(ds$attendF, levels=1:8, labels=names(attcol8))
library(ggplot2)
p<-ggplot(subset(ds, !is.na(attendF)), aes(x=yearF, fill=attendF))
p<-p+ geom_bar(position="fill")
p<-p+ scale_fill_manual(values = attcol8,
name="Response category" )
p<-p+ scale_y_continuous("Prevalence: proportion of total",
limits=c(0, 1),
breaks=c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1))
p<-p+ scale_x_discrete("Waves of measurement",
limits=as.character(c(2001:2004)))
p<-p+ labs(title=paste0("In the past year, how often have you attended a worship service?"))
p
This gave the following plot