Is there a more elegant way to convert two-digit years to four-digit years with lubridate?

eipi10 picture eipi10 · Sep 7, 2012 · Viewed 12.7k times · Source

If a date vector has two-digit years, mdy() turns years between 00 and 68 into 21st Century years and years between 69 and 99 into 20th Century years. For example:

library(lubridate)    
mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))

gives the following output:

Multiple format matches with 5 successes: %m/%d/%y, %m/%d/%Y.
Using date format %m/%d/%y.
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC" "2004-01-02 UTC"

I can fix this after the fact by subtracting 100 from the incorrect dates to turn 2054 and 2068 into 1954 and 1968. But is there a more elegant and less error-prone method of parsing two-digit dates so that they get handled correctly in the parsing process itself?

Update: After @JoshuaUlrich pointed me to strptime I found this question, which deals with an issue similar to mine, but using base R.

It seems like a nice addition to date handling in R would be some way to handle century selection cutoffs for two-digit dates within the date parsing functions.

Answer

Andrie picture Andrie · Oct 18, 2012

Here is a function that allows you to do this:

library(lubridate)
x <- mdy(c("1/2/54","1/2/68","1/2/69","1/2/99","1/2/04"))


foo <- function(x, year=1968){
  m <- year(x) %% 100
  year(x) <- ifelse(m > year %% 100, 1900+m, 2000+m)
  x
}

Try it out:

x
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"

foo(x)
[1] "2054-01-02 UTC" "2068-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"

foo(x, 1950)
[1] "1954-01-02 UTC" "1968-01-02 UTC" "1969-01-02 UTC" "1999-01-02 UTC"
[5] "2004-01-02 UTC"

The bit of magic here is to use the modulus operator %% to return the fraction part of a division. So 1968 %% 100 yields 68.