What is the difference between with and within in R?

nan picture nan · Feb 17, 2014 · Viewed 16.4k times · Source

I always use "with" instead of "within" within the context of my research, but I originally thought they were the same. Just now I mistype "with" for "within" and the results returned are quite different. I am wondering why?

I am using the baseball data in the plyr package, so I first load the library by

 require(plyr)

Then, I want to select all rows with an id "ansonca01". At first, as I said, I used "within", and run the function as follows:

within(baseball, baseball[id=="ansonca01", ])

I got very strange results which basically includes everything:

       id year stint team lg   g  ab   r   h X2b X3b hr rbi  sb cs  bb  so ibb hbp sh sf gidp
4     ansonca01 1871     1  RC1     25 120  29  39  11   3  0  16   6  2   2   1  NA  NA NA NA   NA
44    forceda01 1871     1  WS3     32 162  45  45   9   4  0  29   8  0   4   0  NA  NA NA NA   NA
68    mathebo01 1871     1  FW1     19  89  15  24   3   1  0  10   2  1   2   0  NA  NA NA NA   NA
99    startjo01 1871     1  NY2     33 161  35  58   5   1  1  34   4  2   3   0  NA  NA NA NA   NA
102   suttoez01 1871     1  CL1     29 128  35  45   3   7  3  23   3  1   1   0  NA  NA NA NA   NA
106   whitede01 1871     1  CL1     29 146  40  47   6   5  1  21   2  2   4   1  NA  NA NA NA   NA
113    yorkto01 1871     1  TRO     29 145  36  37   5   7  2  23   2  2   9   1  NA  NA NA NA   NA
.........

Then I use "with" instead of "within",

 with(baseball, baseball[id=="ansonca01",])

and got the results that I expected

      id year stint team lg   g  ab   r   h X2b X3b hr rbi sb cs  bb so ibb hbp sh sf gidp
4    ansonca01 1871     1  RC1     25 120  29  39  11   3  0  16  6  2   2  1  NA  NA NA NA   NA
121  ansonca01 1872     1  PH1     46 217  60  90  10   7  0  50  6  6  16  3  NA  NA NA NA   NA
276  ansonca01 1873     1  PH1     52 254  53 101   9   2  0  36  0  2   5  1  NA  NA NA NA   NA
398  ansonca01 1874     1  PH1     55 259  51  87   8   3  0  37  6  0   4  1  NA  NA NA NA   NA
525  ansonca01 1875     1  PH1     69 326  84 106  15   3  0  58 11  6   4  2  NA  NA NA NA   NA

I checked the documentation of with and within by typing help(with) in R environment, and got the following:

with is a generic function that evaluates expr in a local environment constructed from data. The environment has the caller's environment as its parent. This is useful for simplifying calls to modeling functions. (Note: if data is already an environment then this is used with its existing parent.)

Note that assignments within expr take place in the constructed environment and not in the user's workspace.

within is similar, except that it examines the environment after the evaluation of expr and makes the corresponding modifications to data (this may fail in the data frame case if objects are created which cannot be stored in a data frame), and returns it. within can be used as an alternative to transform.

From this explanation of the differences, I don't get why I obtained different results with such a simple operation. Anyone has ideas?

Answer

thelatemail picture thelatemail · Feb 17, 2014

I find simple examples often work to highlight the difference. Something like:

df <- data.frame(a=1:5,b=2:6)
df
  a b
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

with(df, {c <- a + b; df;} )
  a b
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6

within(df, {c <- a + b; df;} )
# equivalent to: within(df, c <- a + b)
# i've just made the return of df explicit 
# for comparison's sake
  a b  c
1 1 2  3
2 2 3  5
3 3 4  7
4 4 5  9
5 5 6 11