I am trying to figure out why the rbind
function is not working as intended when joining data.frames without names.
Here is my testing:
test <- data.frame(
id=rep(c("a","b"),each=3),
time=rep(1:3,2),
black=1:6,
white=1:6,
stringsAsFactors=FALSE
)
# take some subsets with different names
pt1 <- test[,c(1,2,3)]
pt2 <- test[,c(1,2,4)]
# method 1 - rename to same names - works
names(pt2) <- names(pt1)
rbind(pt1,pt2)
# method 2 - works - even with duplicate names
names(pt1) <- letters[c(1,1,1)]
names(pt2) <- letters[c(1,1,1)]
rbind(pt1,pt2)
# method 3 - works - with a vector of NA's as names
names(pt1) <- rep(NA,ncol(pt1))
names(pt2) <- rep(NA,ncol(pt2))
rbind(pt1,pt2)
# method 4 - but... does not work without names at all?
pt1 <- unname(pt1)
pt2 <- unname(pt2)
rbind(pt1,pt2)
This seems a bit odd to me. Am I missing a good reason why this shouldn't work out of the box?
edit for additional info
Using @JoshO'Brien's suggestion to debug
, I can identify the error as occurring during this if
statement part of the rbind.data.frame
function
if (is.null(pi) || is.na(jj <- pi[[j]]))
(online version of code here: http://svn.r-project.org/R/trunk/src/library/base/R/dataframe.R starting at: "### Here are the methods for rbind and cbind.")
From stepping through the program, the value of pi
does not appear to have been set at this point, hence the program tries to index the built-in constant pi
like pi[[3]]
and errors out.
From what I can figure, the internal pi
object doesn't appear to be set due to this earlier line where clabs
has been initialized as NULL
:
if (is.null(clabs)) clabs <- names(xi) else { #pi gets set here
I am in a tangle trying to figure this out, but will update as it comes together.
Because unname()
& explicitly assigning NA as column headers are not identical actions. When the column names are all NA, then an rbind()
is possible. Since rbind()
takes the names/colnames of the data frame, the results do not match & hence rbind()
fails.
Here is some code to help see what I mean:
> c1 <- c(1,2,3)
> c2 <- c('A','B','C')
> df1 <- data.frame(c1,c2)
> df1
c1 c2
1 1 A
2 2 B
3 3 C
> df2 <- data.frame(c1,c2) # df1 & df2 are identical
>
> #Let's perform unname on one data frame &
> #replacement with NA on the other
>
> unname(df1)
NA NA
1 1 A
2 2 B
3 3 C
> tem1 <- names(unname(df1))
> tem1
NULL
>
> #Please note above that the column headers though showing as NA are null
>
> names(df2) <- rep(NA,ncol(df2))
> df2
NA NA
1 1 A
2 2 B
3 3 C
> tem2 <- names(df2)
> tem2
[1] NA NA
>
> #Though unname(df1) & df2 look identical, they aren't
> #Also note difference in tem1 & tem2
>
> identical(unname(df1),df2)
[1] FALSE
>
I hope this helps. The names show up as NA
each, but the two operations are different.
Hence, two data frames with their column headers replaced to NA
can be "rbound" but two data frames without any column headers (achieved using unname()
) cannot.