I would like to merge polygon data and raster data into one dataframe for purposes of then using randomForests package in R.
This involves first extracting the mean raster value per polygon.
So far, I have the following:
#load libraries
library(raster)
library(rgdal)
library(sp)
library(maptools)
#import raster data
r <- raster("myRasterdata.tif")
#import polygon data
p <- readShapePoly("myPolydata.shp")
#extract mean raster value for each polygon
ExtractMyData <- extract(r, p, small=TRUE, fun=mean, na.rm=TRUE, df=FALSE, nl=1, sp=TRUE)
# note I have also tried this with df=TRUE and sp=FALSE
The output is a matrix, which I can write to a dataframe. But it does not have the spatial coordinates or the original polygon IDs, so I don't know how to join the output into the same database.I thought the sp=TRUE argument would do this, but it doesn't seem to work.
Note that I will actually have to convert the polygons to points (using a centroid method?) for purposes of RandomForests so I could guess what I really want is to join the mean raster values joined to points, not polygons.
Any suggestions would be greatly appreciated. Thank you!!
This works:
library(raster)
library(sp)
library(maptools)
#import polygon data
data(wrld_simpl)
p <- wrld_simpl
#create raster data
r <- raster(extent(p))
r[] <- seq_len(ncell(r))
## this does it directly, adding columns "names(r)" to "p"
p <- extract(brick(r, r * 2), p, fun = mean, na.rm = TRUE, sp = TRUE)
You can also do it more manually, see how extract with an aggregating function gives a single column vector:
p$ExtractData <- extract(r, p, fun = mean, na.rm = TRUE)
Or you could work on a multi-layer raster, column by column like this:
b <- brick(r, r * 2)
extr <- extract(b, p, fun = mean, na.rm = TRUE)
for (i in seq_len(ncol(extr))) p[[colnames(extr)[i]]] <- extr[,i]