I created a list and I stored one data frame in each component. Now I would like to filter those data frames keeping only the rows that have NA in a specific column. I would like the result of this operation to be another list containing data frames with only those rows having NA in that column.
Here is some code to clarify what I am saying. Assume d1
and d2
are my data frames
set.seed(1)
d1<-data.frame(a=rnorm(5), b=c(rep(2006, times=4),NA))
d2<-data.frame(a=1:5, b=c(2007, 2007, NA, NA, 2007))
print(d1)
a b
1.3011543 2006
0.3780023 2006
-0.3101449 2006
-1.3927445 2006
-1.0726218 NA
print(d2)
a b
1 2007
2 2007
3 NA
4 NA
5 2007
which I place in a list with a for loop
ls<-list()
for (i in 1:2){
str<-paste("d", i, sep="")
dat<-get(str)
ls[[str]]<-dat
}
Now I would like to filter each list component so to leave only rows of column b that contain NA. To do this I tried using the following command, knowing from the beginning it would have failed. My problem is that I don't know if subset()
is the right function to use and, in case it is, I don't know how to qualify each data frame (that is, the first element of the subset function)
lsNA<-lapply(ls, subset(ls, is.na(b)))
Can you please help me get past my severe limitations?
lapply
's second argument is a function (subset
) and extra arguments to subset
are passed as the ...
arguments to lapply
. Hence:
my.ls <- list(d1 = d1, d2 = d2)
my.lsNA <- lapply(my.ls, subset, is.na(b))
(I am also showing you how to easily create the list of data.frames without using get
, and recommend you don't use ls
as a variable name since it is also the name of a rather common function.)