R, deep vs. shallow copies, pass by reference

Alex picture Alex · May 18, 2012 · Viewed 11.2k times · Source

I would like to understand the logic R uses when passing arguments to functions, creating copies of variables, etc. with respect to the memory usage. When does it actually create a copy of the variable vs. just passing a reference to that variable? In particular the situations I am curious about are:

f <- function(x) {x+1}
a <- 1
f(a)

Is a being passed literally or is a reference to a being passed?

x <- 1
y <- x

Reference of copy? When is this not the case?

If someone could explain this to me I would highly appreciate.

Answer

IRTFM picture IRTFM · May 18, 2012

When it passes variables, it is always by copy rather than by reference. Sometimes, however, you will not get a copy made until an assignment actually occurs. The real description of the process is pass-by-promise. Take a look at the documentation

?force
?delayedAssign

One practical implication is that it is very difficult if not impossible to avoid needing at least twice as much RAM as your objects nominally occupy. Modifying a large object will generally require making a temporary copy.

update: 2015: I do (and did) agree with Matt Dowle that his data.table package provides an alternate route to assignment that avoids the copy-duplication problem. If that was the update requested, then I didn't understand it at the time the suggestion was made.

There was a recent change in R 3.2.1 in the evaluation rules for apply and Reduce. It was SO-announced with reference to the News here: Returning anonymous functions from lapply - what is going wrong?

And the interesting paper cited by jhetzel in the comments is now here: