I have a large number of polygons (~100000) and try to find a smart way of calculating their intersecting area with a regular grid cells.
Currently, I am creating the polygons and the grid cells using shapely (based on their corner coordinates). Then, using a simple for-loop I go through each polygon and compare it to nearby grid cells.
Just a small example to illustrate the polygons/grid cells.
from shapely.geometry import box, Polygon
# Example polygon
xy = [[130.21001, 27.200001], [129.52, 27.34], [129.45, 27.1], [130.13, 26.950001]]
polygon_shape = Polygon(xy)
# Example grid cell
gridcell_shape = box(129.5, -27.0, 129.75, 27.25)
# The intersection
polygon_shape.intersection(gridcell_shape).area
(BTW: the grid cells have the dimensions 0.25x0.25 and the polygons 1x1 at max)
Actually this is quite fast for an individual polygon/grid cell combo with around 0.003 seconds. However, running this code on a huge amount of polygons (each one could intersect dozens of grid cells) takes around 15+ minutes (up to 30+ min depending on the number of intersecting grid cells) on my machine which is not acceptable. Unfortunately, I have no idea how it is possible to write a code for polygon intersection to get the area of overlap. Do you have any tips? Is there an alternative to shapely?
Consider using Rtree to help identify which grid cells that a polygon may intersect. This way, you can remove the for loop used with the array of lat/lons, which is probably the slow part.
Structure your code something like this:
from shapely.ops import cascaded_union
from rtree import index
idx = index.Index()
# Populate R-tree index with bounds of grid cells
for pos, cell in enumerate(grid_cells):
# assuming cell is a shapely object
idx.insert(pos, cell.bounds)
# Loop through each Shapely polygon
for poly in polygons:
# Merge cells that have overlapping bounding boxes
merged_cells = cascaded_union([grid_cells[pos] for pos in idx.intersection(poly.bounds)])
# Now do actual intersection
print(poly.intersection(merged_cells).area)