I would like to understand the logic R uses when passing arguments to functions, creating copies of variables, etc. with respect to the memory usage. When does it actually create a copy of the variable vs. just passing a reference to that variable? In particular the situations I am curious about are:
f <- function(x) {x+1}
a <- 1
f(a)
Is a
being passed literally or is a reference to a being passed?
x <- 1
y <- x
Reference of copy? When is this not the case?
If someone could explain this to me I would highly appreciate.
When it passes variables, it is always by copy rather than by reference. Sometimes, however, you will not get a copy made until an assignment actually occurs. The real description of the process is pass-by-promise. Take a look at the documentation
?force
?delayedAssign
One practical implication is that it is very difficult if not impossible to avoid needing at least twice as much RAM as your objects nominally occupy. Modifying a large object will generally require making a temporary copy.
update: 2015: I do (and did) agree with Matt Dowle that his data.table package provides an alternate route to assignment that avoids the copy-duplication problem. If that was the update requested, then I didn't understand it at the time the suggestion was made.
There was a recent change in R 3.2.1 in the evaluation rules for apply
and Reduce
. It was SO-announced with reference to the News here: Returning anonymous functions from lapply - what is going wrong?
And the interesting paper cited by jhetzel in the comments is now here: