line graph with 2 categorical variables and 1 continuous in R

Val picture Val · Mar 16, 2015 · Viewed 9k times · Source

I'm quite new to R and statistics in general. I am trying to plot in a line graph 2 categorical variables (part of speech "pos", condition "trcond") and a numerical one (score "totacc") in ggplot2.

> df1<-df[, c("trcond", "subtitle", "pos", "totacc")]
> head(df1)
   trcond     subtitle pos totacc
7       L New Scene_16 lex  0.250
29      N New Scene_16 lex  0.500
8       L New Scene_25 lex  0.875
30      N New Scene_25 lex  0.666
9       L New Scene_29 lex  1.000
31      N New Scene_29 lex  0.833

I have used this ggplot2 command:

>ggplot(data=summdfo, aes(x=pos, y=totacc, group=trcond, colour=trcond))
+ geom_line() + geom_point()

But it is not working, the graph has coloured (blue and red) dots all over the place and more than just two lines linking them. I would like to post the graph I get as I lack words to explain but this is my first post and I don't seem to be able to upload pictures.

I would like to get a standard simple 2-line graph such as the blue and red ones in this page (where y=total bill, by x=time (lunch,dinner) grouped by gender): http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_%28ggplot2%29/

Is this possible with my data set at all? If so, what am I doing wrong with the code?

Answer

arman picture arman · Mar 17, 2015

Here I tried to create a data frame based on limited sample from your data.

df1 <- data.frame(trcond=rep(c('L', 'N'), 3), 
                  subtitle=rep('New Scene_29', 6),  # Not in use, just a dummy
                  pos=c('lex', 'lex', 'lex', 'noLex', 'noLex', 'noLex'), 
                  totacc=c(0.250, 0.5, 0.875, 0.666, 1.000, 0.833))

Because trcond by pos is not balanced in this data frame, the plot is going to be jumbled up like this:

ggplot(data=df1, aes(x=pos, y=totacc, group=trcond, color=trcond))+ 
  geom_line() + 
  geom_point()

enter image description here However, if you apply a summary function which will compute means for each condition, a correct plot will appear:

ggplot(data=df1, aes(x=pos, y=totacc, group=trcond, color=trcond))+ 
  geom_line(stat='summary', fun.y='mean') + 
  geom_point(stat='summary', fun.y='mean')

enter image description here Again, this is trying to figure out what's in your data. The best is that you provide here a sample of your data using dput(head(df1, 50)) to give you a better answer.