I have a set of data that is tough to visualize, but I think an ECDF with a couple of points and lines added to it will do the trick. I am able to plot things the way that I want; my problem is coloring things correctly.
I have the following code, which puts all of the right lines and points on the plot, but now I would like to properly color and label everything. I've pored over multiple articles and tried a hundred things, but can't get it right. Do i need to format my data differently?
My vision for the legend is something like this:
code for generating an example plot is here:
require(ggplot2)
require(reshape2)
s.a = rnorm(100)*100
s.b = rnorm(100)*100+50
d.a = -35
d.b = 20
sdata = data.frame(cbind(s.a,s.b))
ddata = data.frame(cbind(d.a,d.b))
sdata.m = melt(sdata)
ddata.m = melt(ddata)
ggplot(sdata.m, aes(x=value, color=variable)) +
geom_vline(data=ddata.m,
aes(xintercept = value,
color=variable),
linetype = 2,
size=2) +
stat_ecdf(size=1)+
labs(title = 'plotTitle',
color='colorLegendTitle') +
xlab('xLabel') +
ylab('yLabel')+
theme_bw(30) +
theme(
legend.position=c(.8, .2),
legend.box="horizontal",
text=element_text(family="Times"),
legend.key.size = unit(1,"cm")) +
geom_point(x=mean(sdata.m$value[sdata.m$variable=="s.a"]),y=.5,
size = 5) +
geom_point(x=mean(sdata.m$value[sdata.m$variable=="s.b"]),y=.5,
size = 5)
Some context on the data I'm plotting: I have stochastic datasets (s) and deterministic sets (d); each stochastic set will have hundreds of values, while the deterministic sets only have a single value. So in my plot, I'm comparing the distribution of stochastic data (solid lines), and the mean of stochastic data (dots) with the deterministic values (dashed lines). For both the stochastic and deterministic datasets, there are two 'cases' (a) and (b). I would like all (a) and (b) data to share the same color.
This seems like it should be easy with aes and color/linetype/geom mappings, but I can't figure it out.
Thanks in advance.
To get better legend place color=variable
and linetype=variable
inside aes()
for the ggplot()
and for geom_vline()
- so there will be one legend. Then for geom_point()
place x and y inside aes()
as well as color="s.mean"
and linetype="s.mean"
. This will ensure that new level is added to legend. Now with scale_color"manual()
and scale_linetype_manual()
you can set desired colors and linetypes. With guides()
and override.aes=
you can remove points from first four entries.
ggplot(sdata.m, aes(x=value, color=variable,linetype=variable))+
stat_ecdf(size=1)+
geom_vline(data=ddata.m,
aes(xintercept = value,color=variable,linetype=variable),
size=2) +
geom_point(aes(x=mean(sdata.m$value[sdata.m$variable=="s.a"]),
color="s.mean",linetype="s.mean",y=.5),size = 5) +
geom_point(aes(x=mean(sdata.m$value[sdata.m$variable=="s.b"]),
color="s.mean",linetype="s.mean",y=.5),size = 5)+
scale_color_manual(breaks=c("d.a","d.b","s.a","s.b","s.mean"),
values=c("blue","blue","red","red","green"))+
scale_linetype_manual(breaks=c("d.a","d.b","s.a","s.b","s.mean"),
values=c(1,2,1,2,0))+
guides(color=guide_legend(override.aes=list(shape=c(NA,NA,NA,NA,16))))