Merging data frames with different number of rows and different columns

rar picture rar · Feb 4, 2016 · Viewed 41.3k times · Source

I have two data frames with different number of columns and rows. I want to combine them into one data frame.

> month.saf
   Name NCDC    Year    Month   Day HrMn    Temp    Q
244 AP  99999   2014    2       1   0      12       1
245 AP  99999   2014    2       1   300    12.2     1
246 AP  99999   2014    2       1   600    14.4     1
247 AP  99999   2014    2       1   900    18.6     1
248 AP  99999   2014    2       1   1200   18       1
249 AP  99999   2014    2       1   1500   13.6     1
250 AP  99999   2014    2       1   1800   11.8     1
251 AP  99999   2014    2       1   2100   10.8     1
252 AP  99999   2014    2       2   0      8.4      1
253 AP  99999   2014    2       2   300    8.6      1
254 AP  99999   2014    2       2   600    19.8     2
255 AP  99999   2014    2       2   900    22.8     1
256 AP  99999   2014    2       2   1200   20.8     1
257 AP  99999   2014    2       2   1500   16.4     1
258 AP  99999   2014    2       2   1800   13.4     1
259 AP  99999   2014    2       2   2100   12.4     1
> T2Mdf
                    V1               V2
0     293.494262695312 291.642639160156
300   294.003479003906 292.375091552734
600   296.809997558594 295.207885742188
900   298.287811279297 297.181549072266
1200  298.317565917969 297.725708007813
1500  298.134002685547 296.226165771484
1800  296.006805419922 293.354248046875
2100  293.785491943359 293.547210693359
0.1   294.638732910156 293.019866943359
300.1 292.179992675781 291.256958007812

The output that I want is like this:

    Name    NCDC    Year    Month   Day HrMn    Temp    Q   V1          V2
244 AP  99999   2014        2       1   0       12      1   293.4942627 291.6426392
245 AP  99999   2014        2       1   300     12.2    1   294.003479  292.3750916
246 AP  99999   2014        2       1   600     14.4    1   296.8099976 295.2078857
247 AP  99999   2014        2       1   900     18.6    1   298.2878113 297.1815491
248 AP  99999   2014        2       1   1200    18      1   298.3175659 297.725708
249 AP  99999   2014        2       1   1500    13.6    1   298.1340027 296.2261658
250 AP  99999   2014        2       1   1800    11.8    1   296.0068054 293.354248
251 AP  99999   2014        2       1   2100    10.8    1   293.7854919 293.5472107
252 AP  99999   2014        2       2   0       8.4     1   294.6387329 293.0198669
253 AP  99999   2014        2       2   300     8.6     1   292.1799927 291.256958
254 AP  99999   2014        2       2   600     19.8    2   292.2477417 291.3471069
255 AP  99999   2014        2       2   900     22.8    1   294.2276306 294.2766418
256 AP  99999   2014        2       2   1200    20.8    1   NA          NA
257 AP  99999   2014        2       2   1500    16.4    1   NA          NA
258 AP  99999   2014        2       2   1800    13.4    1   NA          NA
259 AP  99999   2014        2       2   2100    12.4    1   NA          NA

I tried cbindbut it gives me an error

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 216, 220

And using rbind.fill() but it gives me something like

V1               V2                     Name        USAF  NCDC Year Month Day HrMn  I   Type QCP Temp  Q
    1  293.494262695312 291.642639160156       <NA>     NA    NA   NA    NA  NA   NA    NA  <NA>  NA <NA> NA
    2  294.003479003906 292.375091552734       <NA>     NA    NA   NA    NA  NA   NA    NA  <NA>  NA <NA> NA
    3  296.809997558594 295.207885742188       <NA>     NA    NA   NA    NA  NA   NA    NA  <NA>  NA <NA> NA
    4  298.287811279297 297.181549072266       <NA>     NA    NA   NA    NA  NA   NA    NA  <NA>  NA <NA> NA
    5  298.317565917969 297.725708007813       <NA>     NA    NA   NA    NA  NA   NA    NA  <NA>  NA <NA> NA
    6              <NA>             <NA>        AP  421820 99999 2014     2   1    0    4   FM-12 NA   12  1
    7              <NA>             <NA>        AP  421820 99999 2014     2   1  300    4   FM-12 NA 12.2  1
    8              <NA>             <NA>        AP  421820 99999 2014     2   1  600    4   FM-12 NA 14.4  1
    9              <NA>             <NA>        AP  421820 99999 2014     2   1  900    4   FM-12 NA 18.6  1
    10             <NA>             <NA>        AP  421820 99999 2014     2   1 1200    4   FM-12 NA   18  1

How is it possible to do this in R?

Answer

G. Grothendieck picture G. Grothendieck · Feb 4, 2016

If A and B are the two input data frames, here are some solutions:

1) merge This solutions works regardless of whether A or B has more rows.

merge(data.frame(A, row.names=NULL), data.frame(B, row.names=NULL), 
  by = 0, all = TRUE)[-1]

The first two arguments could be replaced with just A and B respectively if A and B have default rownames, i.e. 1, 2, ..., or if they have consistent rownames. That is, merge(A, B, by = 0, all = TRUE)[-1] .

For example, if we have this input:

# test inputs
A <- data.frame(BOD, row.names = letters[1:6])
B <- setNames(2 * BOD[1:2, ], c("X", "Y"))

then:

merge(data.frame(A, row.names=NULL), data.frame(B, row.names=NULL), 
  by = 0, all = TRUE)[-1]

gives:

  Time demand  X    Y
1    1    8.3  2 16.6
2    2   10.3  4 20.6
3    3   19.0 NA   NA
4    4   16.0 NA   NA
5    5   15.6 NA   NA
6    7   19.8 NA   NA

1a) An equivalent variation is:

do.call("merge", c(lapply(list(A, B), data.frame, row.names=NULL), 
  by = 0, all = TRUE))[-1]

2) cbind.zoo This solution assumes that A has more rows and that B's entries are all of the same type, e.g. all numeric. A is not restricted. These conditions hold in the data of the question.

library(zoo)
data.frame(A, cbind(zoo(, 1:nrow(A)), as.zoo(B)))