I am trying to create a dendrogram, were my samples have 5 group codes (act as sample name/species/etc but its repetitive).
Therefore, I have two issues that a help will be great:
How can I show the group codes in leaf label (instead of the sample number)?
I wish to assign a color to each code group and colored the leaf label according to it (it might happen that they will not be in the same clade and by that I can find more information)?
Is it possible to do so with my script to do so (ape or ggdendro):
sample<-read.table("C:/.../DOutput.txt", header=F, sep="")
groupCodes <- sample[,1]
sample2<-sample[,2:100]
d <- dist(sample2, method = "euclidean")
fit <- hclust(d, method="ward")
plot(as.phylo(fit), type="fan")
ggdendrogram(fit, theme_dendro=FALSE)
A random dataframe to replace my read.table:
sample = data.frame(matrix(floor(abs(rnorm(20000)*100)),ncol=200))
groupCodes <- c(rep("A",25), rep("B",25), rep("C",25), rep("D",25)) # fixed error
sample2 <- data.frame(cbind(groupCodes), sample)
Here is a solution for this question using a new package called "dendextend", built exactly for this sort of thing.
You can see many examples in the presentations and vignettes of the package, in the "usage" section in the following URL: https://github.com/talgalili/dendextend
Here is the solution for this question: (notice the importance of how to re-order the colors to first fit the data, and then to fit the new order of the dendrogram)
####################
## Getting the data:
sample = data.frame(matrix(floor(abs(rnorm(20000)*100)),ncol=200))
groupCodes <- c(rep("Cont",25), rep("Tre1",25), rep("Tre2",25), rep("Tre3",25))
rownames(sample) <- make.unique(groupCodes)
colorCodes <- c(Cont="red", Tre1="green", Tre2="blue", Tre3="yellow")
distSamples <- dist(sample)
hc <- hclust(distSamples)
dend <- as.dendrogram(hc)
####################
## installing dendextend for the first time:
install.packages('dendextend')
####################
## Solving the question:
# loading the package
library(dendextend)
# Assigning the labels of dendrogram object with new colors:
labels_colors(dend) <- colorCodes[groupCodes][order.dendrogram(dend)]
# Plotting the new dendrogram
plot(dend)
####################
## A sub tree - so we can see better what we got:
par(cex = 1)
plot(dend[[1]], horiz = TRUE)