While duplicated row (and column) names are allowed in a matrix
, they are not allowed in a data.frame
. Trying to rbind()
some data frames having row names in common highlights this problem. Consider two data frames below:
foo = data.frame(a=1:3, b=5:7)
rownames(foo)=c("w","x","y")
bar = data.frame(a=c(2,4), b=c(6,8))
rownames(bar)=c("x","z")
# foo bar
# a b a b
# w 1 5 x 2 6
# x 2 6 y 4 8
# y 3 7
Now trying to rbind()
them (Pay attention to the row names):
rbind(foo, bar)
# a b
# w 1 5
# x 2 6
# y 3 7
# x1 2 6
# z 4 8
But for the case of matrix
:
rbind(as.matrix(foo), as.matrix(bar))
# a b
# w 1 5
# x 2 6
# y 3 7
# x 2 6
# z 4 8
Here is the problem: How to rbind()
two data frames, having duplicated rows (with the same row name) removed?
How about
duprows <- which(!is.na(match(rownames(bar),rownames(foo))))
rbind(foo,bar[-duprows,])
?
Or (based on comments below)
duprows <- rownames(bar) %in% rownames(foo)
rbind(foo, bar[!duprows,])
Several variations are possible depending on (1) selected matched or unmatched; (2) finding numeric or logical values for the matches.