I always use "with" instead of "within" within the context of my research, but I originally thought they were the same. Just now I mistype "with" for "within" and the results returned are quite different. I am wondering why?
I am using the baseball data in the plyr
package, so I first load the library by
require(plyr)
Then, I want to select all rows with an id "ansonca01". At first, as I said, I used "within", and run the function as follows:
within(baseball, baseball[id=="ansonca01", ])
I got very strange results which basically includes everything:
id year stint team lg g ab r h X2b X3b hr rbi sb cs bb so ibb hbp sh sf gidp
4 ansonca01 1871 1 RC1 25 120 29 39 11 3 0 16 6 2 2 1 NA NA NA NA NA
44 forceda01 1871 1 WS3 32 162 45 45 9 4 0 29 8 0 4 0 NA NA NA NA NA
68 mathebo01 1871 1 FW1 19 89 15 24 3 1 0 10 2 1 2 0 NA NA NA NA NA
99 startjo01 1871 1 NY2 33 161 35 58 5 1 1 34 4 2 3 0 NA NA NA NA NA
102 suttoez01 1871 1 CL1 29 128 35 45 3 7 3 23 3 1 1 0 NA NA NA NA NA
106 whitede01 1871 1 CL1 29 146 40 47 6 5 1 21 2 2 4 1 NA NA NA NA NA
113 yorkto01 1871 1 TRO 29 145 36 37 5 7 2 23 2 2 9 1 NA NA NA NA NA
.........
Then I use "with" instead of "within",
with(baseball, baseball[id=="ansonca01",])
and got the results that I expected
id year stint team lg g ab r h X2b X3b hr rbi sb cs bb so ibb hbp sh sf gidp
4 ansonca01 1871 1 RC1 25 120 29 39 11 3 0 16 6 2 2 1 NA NA NA NA NA
121 ansonca01 1872 1 PH1 46 217 60 90 10 7 0 50 6 6 16 3 NA NA NA NA NA
276 ansonca01 1873 1 PH1 52 254 53 101 9 2 0 36 0 2 5 1 NA NA NA NA NA
398 ansonca01 1874 1 PH1 55 259 51 87 8 3 0 37 6 0 4 1 NA NA NA NA NA
525 ansonca01 1875 1 PH1 69 326 84 106 15 3 0 58 11 6 4 2 NA NA NA NA NA
I checked the documentation of with and within by typing help(with)
in R environment, and got the following:
with is a generic function that evaluates expr in a local environment constructed from data. The environment has the caller's environment as its parent. This is useful for simplifying calls to modeling functions. (Note: if data is already an environment then this is used with its existing parent.)
Note that assignments within expr take place in the constructed environment and not in the user's workspace.
within is similar, except that it examines the environment after the evaluation of expr and makes the corresponding modifications to data (this may fail in the data frame case if objects are created which cannot be stored in a data frame), and returns it. within can be used as an alternative to transform.
From this explanation of the differences, I don't get why I obtained different results with such a simple operation. Anyone has ideas?
I find simple examples often work to highlight the difference. Something like:
df <- data.frame(a=1:5,b=2:6)
df
a b
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
with(df, {c <- a + b; df;} )
a b
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
within(df, {c <- a + b; df;} )
# equivalent to: within(df, c <- a + b)
# i've just made the return of df explicit
# for comparison's sake
a b c
1 1 2 3
2 2 3 5
3 3 4 7
4 4 5 9
5 5 6 11