Suppose I have the following data.frame
foo
start.time duration
1 2012-02-06 15:47:00 1
2 2012-02-06 15:02:00 2
3 2012-02-22 10:08:00 3
4 2012-02-22 09:32:00 4
5 2012-03-21 13:47:00 5
And class(foo$start.time)
returns
[1] "POSIXct" "POSIXt"
I'd like to create a plot of foo$duration
v. foo$start.time
. In my scenario, I'm only interested in the time of day rather than the actual day of the year. How does one go about extracting the time of day as hours:seconds from POSIXct
class of vector?
This is a good question, and highlights some of the difficulty in dealing with dates in R. The lubridate package is very handy, so below I present two approaches, one using base (as suggested by @RJ-) and the other using lubridate.
Recreate the (first two rows of) the dataframe in the original post:
foo <- data.frame(start.time = c("2012-02-06 15:47:00",
"2012-02-06 15:02:00",
"2012-02-22 10:08:00"),
duration = c(1,2,3))
Convert to POSIXct and POSIXt class (two ways to do this)
# using base::strptime
t.str <- strptime(foo$start.time, "%Y-%m-%d %H:%M:%S")
# using lubridate::ymd_hms
library(lubridate)
t.lub <- ymd_hms(foo$start.time)
Now, extract time as decimal hours
# using base::format
h.str <- as.numeric(format(t.str, "%H")) +
as.numeric(format(t.str, "%M"))/60
# using lubridate::hour and lubridate::minute
h.lub <- hour(t.lub) + minute(t.lub)/60
Demonstrate that these approaches are equal:
identical(h.str, h.lub)
Then choose one of above approaches to assign decimal hour to foo$hr
:
foo$hr <- h.str
# If you prefer, the choice can be made at random:
foo$hr <- if(runif(1) > 0.5){ h.str } else { h.lub }
then plot using the ggplot2 package:
library(ggplot2)
qplot(foo$hr, foo$duration) +
scale_x_datetime(labels = "%S:00")