R: merge two irregular time series

Benjamin Allévius picture Benjamin Allévius · Aug 17, 2011 · Viewed 28.1k times · Source

I have two multivariate time series x and y, both covering approximately the same range in time (one starts two years before the other, but they end on the same date). Both series have missing observations in the form of empty columns next to the date column, and also in the sense that one of the series has several dates that are not found in the other, and vice versa.

I would like to create a data frame (or similar) with a column that lists all the dates found in x OR y, without duplicate dates. For each date (row), I would like to horizontally stack the observations from x next to the observations from y, with NA's filling the missing cells. Example:

>x
"1987-01-01"   7.1    NA   3
"1987-01-02"   5.2    5    2
"1987-01-06"   2.3    NA   9

>y
"1987-01-01"   55.3   66   45
"1987-01-03"   77.3   87   34

# result I would like
"1987-01-01"   7.1    NA   3   55.3   66   45
"1987-01-02"   5.2    5    2   NA     NA   NA
"1987-01-03"   NA     NA   NA  77.3   87   34
"1987-01-06"   2.3    NA   9   NA     NA   NA

What I have tried: with the zoo package, I've tried the merge.zoo method, but this seems to just stack the two series next to each other, with the dates (as numbers, e.g. "1987-01-02" shown as 6210) from each series appearing in two separate columns.

I've sat for hours getting almost nowhere, so all help is appreciated.

EDIT: some code included below as per suggestion from Soumendra

atcoa <- read.csv(file = "ATCOA_full_adj.csv", header = TRUE)
atcob <- read.csv(file = "ATCOB_full_adj.csv", header = TRUE)
atcoa$date <- as.Date(atcoa$date)
atcob$date <- as.Date(atcob$date)

# only number of observations and the observations themselves differ 
>str(atcoa)
'data.frame':   6151 obs. of  8 variables:
 $ date        :Class 'Date'  num [1:6151] 6210 6213 6215 6216 6217 ...
 $ max         : num  4.31 4.33 4.38 4.18 4.13 4.05 4.08 4.05 4.08 4.1 ...
 $ min         : num  4.28 4.31 4.28 4.13 4.05 3.95 3.97 3.95 4 4.02 ...
 $ close       : num  4.31 4.33 4.31 4.15 4.1 3.97 4 3.97 4.08 4.02 ...
 $ avg         : num  NA NA NA NA NA NA NA NA NA NA ...
 $ tot.vol     : int  877733 89724 889437 1927113 3050611 846525 1782774 1497998 2504466 5636999 ...
 $ turnover    : num  3762300 388900 3835900 8015900 12468100 ...
 $ transactions: int  12 9 24 17 31 26 34 35 37 33 ...

>atcoa[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1 1987-01-02  4.31  4.28    4.31    NA    877733    3762300             12

# using timeSeries package
ts.atcoa <- timeSeries::as.timeSeries(atcoa, format = "%Y-%m-%d")
ts.atcob <- timeSeries::as.timeSeries(atcob, format = "%Y-%m-%d")

>str(ts.atcoa)
Time Series:          
 Name:               object
Data Matrix:        
 Dimension:          6151 7
 Column Names:       a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 Row Names:          1970-01-01 01:43:30  ...  1970-01-01 04:12:35
Positions:          
 Start:              1970-01-01 01:43:30
 End:                1970-01-01 04:12:35
With:               
 Format:             %Y-%m-%d %H:%M:%S
 FinCenter:          GMT
 Units:              a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 Title:              Time Series Object
 Documentation:      Wed Aug 17 13:00:50 2011

>ts.atcoa[1:1, ]
GMT
 a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
 1970-01-01 01:43:30  4.31  4.28    4.31    NA    877733    3762300             12

# The following will create an object of class "data frame" and mode "list", which contains observations for the days mutual for the two series
>ts.atco <- timeSeries::merge(atcoa, atcob)  # produces same result as base::merge, apparently
>ts.atco[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions b.max b.min b.close b.avg b.tot.vol b.turnover b.transactions
1 1989-08-25  7.92  7.77    7.79    NA    269172    2119400             19  7.69  7.56    7.64    NA  81176693  593858000             12

EDIT: problem solved by (using zoo package)

atcoa <- read.zoo(read.csv(file = "ATCOA_full_adj.csv", header = TRUE))
atcob <- read.zoo(read.csv(file = "ATCOB_full_adj.csv", header = TRUE))

names(atcoa) <- c("a.max", "a.min", "a.close",
                   "a.avg", "a.tot.vol", "a.turnover", "a.transactions")
names(atcob) <- c("b.max", "b.min", "b.close",
                   "b.avg", "b.tot.vol", "b.turnover", "b.transactions")

atco <- merge.zoo(atcoa, atcob)

Thank you all for your help.

Answer

G. Grothendieck picture G. Grothendieck · Aug 17, 2011

Try this:

Lines.x <- '"1987-01-01"   7.1    NA   3
"1987-01-02"   5.2    5    2
"1987-01-06"   2.3    NA   9'

Lines.y <- '"1987-01-01"   55.3   66   45
"1987-01-03"   77.3   87   34'

library(zoo)
# in reality x might be in a file and might be read via: x <- read.zoo("x.dat")
# ditto for y. See ?read.zoo and the zoo-read vignette if you need other args too
x <- read.zoo(text = Lines.x)
y <- read.zoo(text = Lines.y)
merge(x,  y)

giving:

           V2.x V3.x V4.x V2.y V3.y V4.y
1987-01-01  7.1   NA    3 55.3   66   45
1987-01-02  5.2    5    2   NA   NA   NA
1987-01-03   NA   NA   NA 77.3   87   34
1987-01-06  2.3   NA    9   NA   NA   NA