I have followed a number of questions here that asks about how to convert character vectors to datetime classes. I often see 2 methods, the strptime and the as.POSIXct/as.POSIXlt methods. I looked at the 2 functions but am unclear what the difference is.
function (x, format, tz = "")
{
y <- .Internal(strptime(as.character(x), format, tz))
names(y$year) <- names(x)
y
}
<bytecode: 0x045fcea8>
<environment: namespace:base>
function (x, tz = "", ...)
UseMethod("as.POSIXct")
<bytecode: 0x069efeb8>
<environment: namespace:base>
function (x, tz = "", ...)
UseMethod("as.POSIXlt")
<bytecode: 0x03ac029c>
<environment: namespace:base>
Doing a microbenchmark to see if there are performance differences:
library(microbenchmark)
Dates <- sample(c(dates = format(seq(ISOdate(2010,1,1), by='day', length=365), format='%d-%m-%Y')), 5000, replace = TRUE)
df <- microbenchmark(strptime(Dates, "%d-%m-%Y"), as.POSIXlt(Dates, format = "%d-%m-%Y"), times = 1000)
Unit: milliseconds
expr min lq median uq max
1 as.POSIXlt(Dates, format = "%d-%m-%Y") 32.38596 33.81324 34.78487 35.52183 61.80171
2 strptime(Dates, "%d-%m-%Y") 31.73224 33.22964 34.20407 34.88167 52.12422
strptime seems slightly faster. so what gives? why would there be 2 similar functions or are there differences between them that I missed?
Well, the functions do different things.
First, there are two internal implementations of date/time: POSIXct
, which stores seconds since UNIX epoch (+some other data), and POSIXlt
, which stores a list of day, month, year, hour, minute, second, etc.
strptime
is a function to directly convert character vectors (of a variety of formats) to POSIXlt
format.
as.POSIXlt
converts a variety of data types to POSIXlt
. It tries to be intelligent and do the sensible thing - in the case of character, it acts as a wrapper to strptime
.
as.POSIXct
converts a variety of data types to POSIXct
. It also tries to be intelligent and do the sensible thing - in the case of character, it runs strptime
first, then does the conversion from POSIXlt
to POSIXct
.
It makes sense that strptime
is faster, because strptime
only handles character input whilst the others try to determine which method to use from input type. It should also be a bit safer in that being handed unexpected data would just give an error, instead of trying to do the intelligent thing that might not be what you want.