What is the difference between as.tibble(), as_data_frame(), and tbl_df()?

Chill2Macht picture Chill2Macht · May 12, 2017 · Viewed 13.8k times · Source

I remember reading somewhere that as.tibble() is an alias for as_data_frame(), but I don't know what exactly an alias is in programming terminology. Is it similar to a wrapper?

So I guess my question probably comes down to the difference in possible usages between tbl_df() and as_data_frame(): what are the differences between them, if any?

More specifically, given a (non-tibble) data frame df, I often turn it into a tibble by using:

df <- tbl_df(df)

Wouldn't

df <- as_data_frame(df)

do the same thing? If so, are there other cases where the two functions tbl_df() and as_data_frame() can not be used interchangeably to get the same result?

The R documentation says that

tbl_df() forwards the argument to as_data_frame()

does that mean that tbl_df() is a wrapper or alias for as_data_frame()? R documentation doesn't seem to say anything about as.tibble() and I forgot where I read that it was an alias for as_data_frame(). Also, apparently as_tibble() is another alias for as_data_frame().

If these four functions really are all the same function, what is the sense in giving one function four different names? Isn't that more confusing than helpful?

Answer

rsmith54 picture rsmith54 · May 12, 2017

To answer your question of "whether it is confusing", I think so :) .

as.tibble and as_tibble are the same; both simply call the S3 method as_tibble:

> as.tibble
function (x, ...) 
{
    UseMethod("as_tibble")
}
<environment: namespace:tibble>

as_data_frame and tbl_df are not exactly the same; tbl_df calls as_data_frame:

> tbl_df
function (data) 
{
    as_data_frame(data)
}
<environment: namespace:dplyr>

Note tbl_df is in dplyr while as_data_frame is in the tibble package:

> as_data_frame
function (x, ...) 
{
    UseMethod("as_data_frame")
}
<environment: namespace:tibble>

but of course it calls the same function, so they are "the same", or aliases as you say.

Now, we can look at the differences between the generic methods as_tibble and as_data_frame. First, we look at the methods of each:

> methods(as_tibble)
[1] as_tibble.data.frame* as_tibble.default*    as_tibble.list* as_tibble.matrix*     as_tibble.NULL*      
[6] as_tibble.poly*       as_tibble.table*      as_tibble.tbl_df* as_tibble.ts*        
see '?methods' for accessing help and source code
> methods(as_data_frame)
[1] as_data_frame.data.frame* as_data_frame.default*  as_data_frame.grouped_df* as_data_frame.list*      
[5] as_data_frame.matrix*     as_data_frame.NULL*       as_data_frame.table*      as_data_frame.tbl_cube*  
[9] as_data_frame.tbl_df*    
see '?methods' for accessing help and source code

If you check out the code for as_tibble, you can see that the definitions for many of the as_data_frame methods as well. as_tibble defines two additional methods which aren't defined for as_data_frame, as_tibble.ts and as_tibble.poly. I'm not really sure why they couldn't be also defined for as_data_frame.

as_data_frame has two additional methods, which are both defined in dplyr: as_data_frame.tbl_cube and as_data_frame.grouped_df.

as_data_frame.tbl_cube use the weaker checking of as.data.frame (yes, bear with me) to then call as_data_frame:

> getAnywhere(as_data_frame.tbl_cube)
function (x, ...) 
{
    as_data_frame(as.data.frame(x, ..., stringsAsFactors = FALSE))
}
<environment: namespace:dplyr>

while as_data_frame.grouped_df ungroups the passed dataframe.

Overall, it seems that as_data_frame should be seen as providing additional functionality over as_tibble, unless you are dealing with ts or poly objects.