I am trying to turn a nested list structure into a dataframe. The list looks similar to the following (it is serialized data from parsed JSON read in using the httr package).
myList <- list(object1 = list(w=1, x=list(y=0.1, z="cat")), object2 = list(w=NULL, x=list(z="dog")))
EDIT: my original example data was too simple. The actual data are ragged, meaning that not all variables exist for every object, and some of the list elements are NULL. I edited the data to reflect this.
unlist(myList)
does a great job of recursively flattening the list, and I can then use lapply
to flatten all the objects nicely.
flatList <- lapply(myList, FUN= function(object) {return(as.data.frame(rbind(unlist(object))))})
And finally, I can button it up using plyr::rbind.fill
myDF <- do.call(plyr::rbind.fill, flatList)
str(myDF)
#'data.frame': 2 obs. of 3 variables:
#$ w : Factor w/ 2 levels "1","2": 1 2
#$ x.y: Factor w/ 2 levels "0.1","0.2": 1 2
#$ x.z: Factor w/ 2 levels "cat","dog": 1 2
The problem is that w and x.y are now being interpreted as character vectors, which by default get parsed as factors in the dataframe. I believe that unlist()
is the culprit, but I can't figure out another way to recursively flatten the list structure. A workaround would be to post-process the dataframe, and assign data types then. What is the best way to determine if a vector is a valid numeric or integer vector?
As discussed here, checking if as.numeric
returns NA
values is a simple approach to checking if a character string contains numeric data. Now you can do something like:
myDF2 <- lapply(myDF, function(col) {
if (suppressWarnings(all(!is.na(as.numeric(as.character(col)))))) {
as.numeric(as.character(col))
} else {
col
}
})
str(myDF2)
# List of 3
# $ w : num [1:2] 1 2
# $ x.y: num [1:2] 0.1 0.2
# $ x.z: Factor w/ 2 levels "cat","dog": 1 2