I want to create a data.frame of different variables, including S4 classes. For a built-in class like "POSIXlt" (for dates) this works fine:
as.data.frame(list(id=c(1,2),
date=c(as.POSIXlt('2013-01-01'),as.POSIXlt('2013-01-02'))
But now i have a user defined class, let's say a "Person" class with name and age:
setClass("person", representation(name="character", age="numeric"))
But the following fails:
as.data.frame(list(id=c(1,2), pers=c(new("person", name="John", age=20),
new("person", name="Tom", age=30))))
I also tried to overload the [...]-Operator for the person class using
setMethod(
f = "[",
signature="person",
definition=function(x,i,j,...,drop=TRUE){
initialize(x, name=x@name[i], age = x@age[i])
}
)
This allows for vector-like behavior:
persons = new("person", name=c("John","Tom"), age=c(20,30))
p1 = persons[1]
But still the following fails:
as.data.frame(list(id=c(1,2), pers=persons))
Perhaps I have to overload more operators to get the user defined class into a dataframe? I am sure, there must be a way to do this, as POSIXlt is an S4 class and it works! Any solution using the new R5 reference classes would be also fine!
I do not want to put all my data into the person class (You could ask, why "id" is not a member of person I just do not use dataframes)! The idea is that my data.frame represents a table from a database with many columns with different types, e.g., strings, numbers,... but also dates, intervals, geo-objects, etc... While for dates I already have a solution (POSIXlt), for intervals, geo-objects, etc. I probably need to specify my own S4/R5 classes.
Thanks a lot in advance.
Here's your class, with a "column" interpretation of its definition, rather than row; this will be important for performance; also date for reference
setClass("person", representation(name="character", age="numeric"))
pers <- new("person", name=c("John", "Tom"), age=c(20, 30))
date <- as.POSIXct(c('2013-01-01', '2013-01-02'))
Some experimenting, including looking at methods(class="POSIXct")
and paying attention to error messages led me to implement as.data.frame.person
and format.person
(the latter is used for display in a data.frame) as
as.data.frame.person <-
function(x, row.names=NULL, optional=FALSE, ...)
{
if (is.null(row.names))
row.names <- x@name
value <- list(x)
attr(value, "row.names") <- row.names
class(value) <- "data.frame"
value
}
format.person <- function(x, ...) paste0(x@name, ", ", x@age)
This gets me my objects in a data.frame:
> lst <- list(id=1:2, date=date, pers=pers)
> as.data.frame(lst)
id date pers
John 1 2013-01-01 John, 20
Tom 2 2013-01-02 Tom, 30
If I want to subset, then I need
setMethod("[", "person", function(x, i, j, ..., drop=TRUE) {
initialize(x, name=x@name[i], age=x@age[i])
})
I'm not sure what other methods might be required as more data.frame
operations are encountered, there is no "data.frame interface".
Using the vectorized class in data.table seems to require a length method for construction.
> library(data.table)
> data.table(id=1:2, pers=pers)
Error in data.table(id = 1:2, pers = pers) :
problem recycling column 2, try a simpler type
> setMethod(length, "person", function(x) length(x@name))
[1] "length"
> data.table(id=1:2, pers=pers)
id pers
1: 1 John, 20
2: 2 Tom, 30
Maybe there's a data.table interface?