Fastest way to extract hour from time (HH:MM)

Gopalakrishna Palem picture Gopalakrishna Palem · Apr 2, 2014 · Viewed 7.5k times · Source

Wish fastPOSIXct works - but not working in this case.

Here is my time data (which does not have dates) - and I need to get the hours-part from them.

times <- c("9:46","11:06", "14:17", "19:53", "0:03", "3:56")

Here is the wrong output from fastPOSIXct:

fastPOSIXct(times, "GMT")
[1] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT"
[3] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT"
[5] "1970-01-01 00:00:00 GMT" "1970-01-01 00:00:00 GMT"

It does not recognize the times without the presence of dates correctly.

The hour method from data.table with as.ITime solves the purpose, but looks like slow on large times arrays.

library(data.table)
hour(as.ITime(times))
# [1]  9 11 14 19  0  3

Wondering if there is some faster way (just like fastPOSIXct, but works without the need for date).

fastPOSIXct really works like snap, but just wrong.

Answer

Henrik picture Henrik · Apr 2, 2014

You may also try substr: as.integer(substr(vals, start = 1, stop = nchar(vals) - 3))


In a benchmark on a vector with 10e6 elements, stringi::stri_sub is fastest, and substr number two.

vals <- sample(c("9:46", "11:06", "14:17", "19:53", "0:03", "3:56"), 1e6, replace = TRUE)

fun_substr <- function(vals) as.integer(substr(vals, start = 1, stop = nchar(vals) - 3))

grab.hrs <- function(vals) as.integer(sub(pattern = ":.*", replacement = "", x = vals))

fun_strtrim <- function(vals) as.integer(strtrim(vals, nchar(vals) - 3))

library(chron)
fun_chron <- function(vals) hours(times(paste0(vals, ":00")))

fun_lt <- function(vals) as.POSIXlt(vals, format="%H:%M")$hour

library(stringi)
fun_stri_sub <- function(vals) as.integer(stri_sub(vals, from = 1, to = -4))

library(microbenchmark)
microbenchmark(fun_substr(vals),
               fun_stri_sub(vals),      
               grab.hrs(vals),
               fun_strtrim(vals),
               fun_lt(vals),
               fun_chron(vals),
               unit = "relative", times = 5)
# Unit: relative
#               expr       min        lq      mean    median        uq       max neval
#   fun_substr(vals)  2.186714  1.902074  2.015082  1.968542  1.945007  2.090236     5
# fun_stri_sub(vals)  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000     5
#     grab.hrs(vals)  2.656630  2.397918  2.687133  2.426223  2.446902  3.263962     5
#  fun_strtrim(vals) 31.177869 27.601380 26.009818 27.423562 17.902507 29.426989     5
#       fun_lt(vals) 47.296929 41.122287 42.266556 40.647465 30.539030 52.710992     5
#    fun_chron(vals)  5.594931  5.159192  5.961775  7.746242  5.286944  6.189742     5