I am working on "Localization Data for Person Activity Data Set" dataset from UCI and in this data set there is a column of date and time(both in one column) with following format:
27.05.2009 14:03:25:777
27.05.2009 14:03:25:183
27.05.2009 14:03:25:210
27.05.2009 14:03:25:237
...
I am wondering if there is anyway to convert this column to timestamp using R.
First of all, we need to substitute the colon separating the milliseconds from the seconds to a dot, otherwise the final step won't work (thanks to Dirk Eddelbuettel for this one). Since in the end R will use the separators it wants, to be quicker, I'll just go ahead and substitute all the colons for dots:
x <- "27.05.2009 14:03:25:777" # this is a simplified version of your data
y <- gsub(":", ".", x) # this is your vector with the aforementioned substitution
By the way, this is how your vector should look after gsub
:
> y
[1] "27.05.2009 14.03.25.777"
Now, in order to have it show the milliseconds, you first need to adjust an R option and then use a function called strptime
, which will convert your date vector to POSIXlt (an R-friendly) format. Just do the following:
> options(digits.secs = 3) # this tells R you want it to consider 3 digits for seconds.
> strptime(y, "%d.%m.%Y %H:%M:%OS") # this finally formats your vector
[1] "2009-05-27 14:03:25.777"
I've learned this nice trick here. This other answer also says you can skip the options
setting and use, for example, strptime(y, "%d.%m.%Y %H:%M:%OS3")
, but it doesn't work for me. Henrik noted that the function's help page, ?strptime
states that the %OS3
bit is OS-dependent. I'm using an updated Ubuntu 13.04 and using %OS3
yields NA
.
When using strptime
(or other POSIX-related functions such as as.Date
), keep in mind some of the most common conversions used (edited for brevity, as suggested by DWin. Complete list at strptime
):
%a
Abbreviated weekday name in the current locale.%A
Full weekday name in the current locale.%b
Abbreviated month name in the current locale.%B
Full month name in the current locale.%d
Day of the month as decimal number (01–31).%H
Hours as decimal number (00–23). Times such as 24:00:00 are accepted for input.%I
Hours as decimal number (01–12).%j
Day of year as decimal number (001–366).%m
Month as decimal number (01–12).%M
Minute as decimal number (00–59).%p
AM/PM indicator in the locale. Used in conjunction with %I
and not with %H
.%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.%w
Weekday as decimal number (0–6, Sunday is 0).%W
Week of the year as decimal number (00–53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.%y
Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19%Y
Year with century. Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC)