Order categorical data in a stacked bar plot with ggplot2

Dominik picture Dominik · Aug 22, 2011 · Viewed 10.8k times · Source

I have a matrix with the following entries:

dput(MilDis[1:200,])
structure(list(hhDomMil = c("HED", "ETB", "HED", "ETB", "PER", 
"BUM", "EXP", "TRA", "TRA", "PMA", "MAT", "MAT", "KON", "ETB", 
"PMA", "PMA", "HED", "BUM", "BUM", "HED", "PMA", "PMA", "HED", 
"TRA", "BUM", "EXP", "BUM", "PMA", "ETB", "MAT", "ETB", "ETB", 
"KON", "MAT", "TRA", "BUM", "BUM", "TRA", "TRA", "PMA", "PMA", 
"PMA", "MAT", "ETB", "TRA", "BUM", "TRA", "MAT", "BUM", "ETB", 
"TRA", "TRA", "BUM", "KON", "ETB", "ETB", "ETB", "BUM", "KON", 
"ETB", "ETB", "PMA", "TRA", "PER", "PER", "MAT", "HED", "KON", 
"TRA", "TRA", "TRA", "EXP", "TRA", "BUM", "MAT", "MAT", "TRA", 
"PMA", "HED", "PER", "TRA", "PER", "EXP", "PER", "BUM", "KON", 
"BUM", "ETB", "ETB", "TRA", "PER", "ETB", "KON", "KON", "BUM", 
"ETB", "BUM", "MAT", "BUM", "KON", "KON", "ETB", "MAT", "KON", 
"PER", "ETB", "ETB", "KON", "PMA", "PER", "HED", "HED", "PMA", 
"MAT", "PMA", "PER", "PMA", "TRA", "TRA", "MAT", "BUM", "BUM", 
"KON", "ETB", "ETB", "ETB", "PMA", "TRA", "TRA", "PMA", "PER", 
"KON", "PER", "BUM", "KON", "ETB", "ETB", "BUM", "TRA", "ETB", 
"PMA", "HED", "MAT", "TRA", "BUM", "PMA", "BUM", "ETB", "TRA", 
"TRA", "TRA", "PER", "EXP", "HED", "BUM", "EXP", "HED", "BUM", 
"MAT", "DDR", "BUM", "MAT", "KON", "HED", "HED", "TRA", "BUM", 
"PMA", "PMA", "PMA", "KON", "KON", "MAT", "ETB", "MAT", "TRA", 
"MAT", "ETB", "ETB", "TRA", "MAT", "ETB", "TRA", "HED", "BUM", 
"MAT", "TRA", "PMA", "BUM", "BUM", "EXP", "ETB", "EXP", "EXP", 
"MAT", "TRA", "KON", "BUM", "BUM", "HED"), kclust = c(1L, 2L, 
15L, 4L, 5L, 6L, 5L, 7L, 8L, 5L, 6L, 5L, 11L, 6L, 5L, 1L, 9L, 
10L, 2L, 1L, 9L, 8L, 4L, 11L, 14L, 5L, 8L, 11L, 12L, 5L, 5L, 
14L, 15L, 2L, 10L, 6L, 8L, 4L, 6L, 8L, 14L, 14L, 16L, 10L, 5L, 
1L, 12L, 17L, 12L, 16L, 16L, 5L, 10L, 14L, 8L, 19L, 5L, 4L, 4L, 
14L, 2L, 14L, 9L, 7L, 1L, 14L, 4L, 15L, 18L, 16L, 9L, 14L, 6L, 
14L, 12L, 11L, 4L, 7L, 8L, 12L, 9L, 16L, 2L, 6L, 15L, 1L, 1L, 
3L, 14L, 5L, 5L, 9L, 14L, 6L, 5L, 14L, 15L, 2L, 14L, 2L, 1L, 
8L, 5L, 10L, 1L, 1L, 16L, 5L, 2L, 9L, 9L, 1L, 12L, 10L, 1L, 4L, 
1L, 9L, 8L, 8L, 5L, 10L, 1L, 10L, 2L, 6L, 15L, 2L, 2L, 10L, 5L, 
6L, 10L, 19L, 19L, 6L, 5L, 6L, 7L, 7L, 8L, 5L, 16L, 5L, 6L, 6L, 
1L, 10L, 12L, 4L, 7L, 19L, 7L, 8L, 16L, 10L, 5L, 16L, 12L, 7L, 
7L, 19L, 4L, 6L, 1L, 15L, 7L, 8L, 16L, 4L, 10L, 15L, 11L, 10L, 
1L, 10L, 17L, 1L, 2L, 1L, 14L, 8L, 8L, 14L, 10L, 8L, 6L, 6L, 
8L, 5L, 7L, 5L, 1L, 5L, 7L, 9L, 2L, 1L, 9L, 14L), order = c(9, 
1, 9, 1, 3, 7, 10, 5, 5, 2, 8, 8, 4, 1, 2, 2, 9, 7, 7, 9, 2, 
2, 9, 5, 7, 10, 7, 2, 1, 8, 1, 1, 4, 8, 5, 7, 7, 5, 5, 2, 2, 
2, 8, 1, 5, 7, 5, 8, 7, 1, 5, 5, 7, 4, 1, 1, 1, 7, 4, 1, 1, 2, 
5, 3, 3, 8, 9, 4, 5, 5, 5, 10, 5, 7, 8, 8, 5, 2, 9, 3, 5, 3, 
10, 3, 7, 4, 7, 1, 1, 5, 3, 1, 4, 4, 7, 1, 7, 8, 7, 4, 4, 1, 
8, 4, 3, 1, 1, 4, 2, 3, 9, 9, 2, 8, 2, 3, 2, 5, 5, 8, 7, 7, 4, 
1, 1, 1, 2, 5, 5, 2, 3, 4, 3, 7, 4, 1, 1, 7, 5, 1, 2, 9, 8, 5, 
7, 2, 7, 1, 5, 5, 5, 3, 10, 9, 7, 10, 9, 7, 8, 6, 7, 8, 4, 9, 
9, 5, 7, 2, 2, 2, 4, 4, 8, 1, 8, 5, 8, 1, 1, 5, 8, 1, 5, 9, 7, 
8, 5, 2, 7, 7, 10, 1, 10, 10, 8, 5, 4, 7, 7, 9)), .Names = c("hhDomMil", 
"kclust", "order"), row.names = c(NA, 200L), class = "data.frame")

I want to create a stacked bar plot like this one Barplot.

The only problem is, that I would like to have the order of the stacks to fit this (ETB,PMA,PER,KON,TRA,DDR,BUM,MAT,HED,EXP) - the order numbers in the matrix and I have also some aesthetic problems. I searched for a solution here but none of the ordering suggestions worked for me... :-\

  1. How do I plot such a ordered plot?
  2. How do I set up x so that each bar is "on" one number?
  3. How do I seperate the bars - here I tried that with a white border...?
  4. How do I print all kclust numbers in x?

Thanks a lot for your help! Dominik


UPDATE

Here is the code I used to draw my plot:

mycols <- c('#FFFD00', '#97CB00', '#3168FF', '#FF0200', '#FB02FE', \
'#CCFCCC', '#FE9900', '#98CBF8', '#00CCFF', '#00FD03') # Set milieu colors


ggplot(MilDis) +
 geom_bar(aes(kclust, fill=factor(hhDomMil), \
 colour=mycols), position='fill', binwidth=1, colour='white') +
 scale_fill_manual(values = mycols)

UPDATE 2:

That's how I did it now:

    mycols <- c('#3168FF', '#00CCFF', '#98CBF8', '#CCFCCC', '#00FD03',\
   '#97CB00', '#FFFD00', '#FE9900', '#FB02FE', '#FF0200') # Set milieu colors

    ggplot(MilDis) +
      geom_bar(aes(factor(kclust), fill=reorder(hhDomMil,order)),\
      position='fill') +
      scale_fill_manual(values = mycols)

With this result:

Image

Thank you all for your help!

Answer

Gavin Simpson picture Gavin Simpson · Aug 22, 2011

The answer to this is easily solved by getting your data formatted correctly before passing it to ggplot(). The key is to explicitly set the levels of the hhDomMil factor. Assuming your data are in dat:

dat <- transform(dat, hhDomMil = factor(hhDomMil,
                                        levels = c("ETB", "PMA", "PER", "KON",
                                                   "TRA", "DDR", "BUM", "MAT",
                                                   "HED", "EXP")))

That fixes hhDomMil as a factor in place inside dat, and sets the levels to be in the order you wanted:

> head(dat$hhDomMil)
[1] HED ETB HED ETB PER BUM
Levels: ETB PMA PER KON TRA DDR BUM MAT HED EXP

Notice what is happing when R coerces hhDomMil to a factor:

> head(factor(as.character(dat$hhDomMil)))
[1] HED ETB HED ETB PER BUM
Levels: BUM DDR ETB EXP HED KON MAT PER PMA TRA

The default is to sort the levels alphabetically, which is why the plot is coming out as you show.

The best advice I can give, is to get your data correctly formatted first and only then try to plot it - don't rely on automatic or on-the-fly conversion to get this right for you; inevitably it won't be what you want.