How to create a dataframe of user defined S4 classes in R

r s4
Patrick Roocks picture Patrick Roocks · Jan 30, 2013 · Viewed 9.2k times · Source

I want to create a data.frame of different variables, including S4 classes. For a built-in class like "POSIXlt" (for dates) this works fine:

as.data.frame(list(id=c(1,2), 
                   date=c(as.POSIXlt('2013-01-01'),as.POSIXlt('2013-01-02'))

But now i have a user defined class, let's say a "Person" class with name and age:

setClass("person", representation(name="character", age="numeric"))

But the following fails:

as.data.frame(list(id=c(1,2), pers=c(new("person", name="John", age=20),
                                     new("person", name="Tom", age=30))))

I also tried to overload the [...]-Operator for the person class using

setMethod(
  f = "[",
  signature="person",
  definition=function(x,i,j,...,drop=TRUE){ 
    initialize(x, name=x@name[i], age = x@age[i])
  }
)

This allows for vector-like behavior:

persons = new("person", name=c("John","Tom"), age=c(20,30))
p1 = persons[1]

But still the following fails:

as.data.frame(list(id=c(1,2), pers=persons))

Perhaps I have to overload more operators to get the user defined class into a dataframe? I am sure, there must be a way to do this, as POSIXlt is an S4 class and it works! Any solution using the new R5 reference classes would be also fine!

I do not want to put all my data into the person class (You could ask, why "id" is not a member of person I just do not use dataframes)! The idea is that my data.frame represents a table from a database with many columns with different types, e.g., strings, numbers,... but also dates, intervals, geo-objects, etc... While for dates I already have a solution (POSIXlt), for intervals, geo-objects, etc. I probably need to specify my own S4/R5 classes.

Thanks a lot in advance.

Answer

Martin Morgan picture Martin Morgan · Jan 30, 2013

Here's your class, with a "column" interpretation of its definition, rather than row; this will be important for performance; also date for reference

setClass("person", representation(name="character", age="numeric"))
pers <- new("person", name=c("John", "Tom"), age=c(20, 30))
date <- as.POSIXct(c('2013-01-01', '2013-01-02'))

Some experimenting, including looking at methods(class="POSIXct") and paying attention to error messages led me to implement as.data.frame.person and format.person (the latter is used for display in a data.frame) as

as.data.frame.person <-
    function(x, row.names=NULL, optional=FALSE, ...)
{
    if (is.null(row.names))
        row.names <- x@name
    value <- list(x)
    attr(value, "row.names") <- row.names
    class(value) <- "data.frame"
    value
}

format.person <- function(x, ...) paste0(x@name, ", ", x@age)

This gets me my objects in a data.frame:

> lst <- list(id=1:2, date=date, pers=pers)
> as.data.frame(lst)
     id       date     pers
John  1 2013-01-01 John, 20
Tom   2 2013-01-02  Tom, 30

If I want to subset, then I need

setMethod("[", "person", function(x, i, j, ..., drop=TRUE) {
    initialize(x, name=x@name[i], age=x@age[i])
})

I'm not sure what other methods might be required as more data.frame operations are encountered, there is no "data.frame interface".

Using the vectorized class in data.table seems to require a length method for construction.

> library(data.table)
> data.table(id=1:2, pers=pers)
Error in data.table(id = 1:2, pers = pers) : 
  problem recycling column 2, try a simpler type
> setMethod(length, "person", function(x) length(x@name))
[1] "length"
> data.table(id=1:2, pers=pers)
   id     pers
1:  1 John, 20
2:  2  Tom, 30

Maybe there's a data.table interface?