The answers to spatial problems tend to converge towards certain techniques
that everyone uses. Here are a few.
Is this point in any of these rectangles? Indexes
If you’ve used PostGIS, it’s likely you’ve run into
the all-important index problem. Adding a spatial index to a table means
the difference between multi- and sub-section queries on a table. In software,
the same principle goes: indexes are typically required for performance if
you’re asking anything about spatial intersections.
Typically spatial indexes are R-trees: the data structure provides a
nice balance of space and performance, and is fairly simple to implement.
Vladimir Agafonkin’s library based on
recent research. Closer to the metal, spatialindex is essential.
Is this point in any of these polygons? Indexes + Point in Polygon
R-Trees contain rectangles and points, not polygons or lines. What if you need
a more specific answer? This leads us to an important technique.
A useful and simple technique for asking questions like ‘is this point in any
of these 2 million polygons?’ is that you can break it down into steps
based on assumptions:
- If a point is in a polygon, it must be in the bounding box for that polygon.
- Therefore, we can answer this question in two steps:
- Which polygon bounding boxes is this point within?
- Of those polygon bounding boxes, how many of the actual polygons does this point fall into?
This takes advantage of the fact that the first question can be extremely
fast to answer, thanks to R-Tree, whereas the second question, whether a
point is in a specific polygon, usually cannot be fast. The canonical
algorithm for point-in-polygon intersections looks at every point in the
polygon, even in highly optimized implementations like GEOS.
and my leaflet-pip expose this algorithm.
It’s an extremely simple technique: barring naive implementation mistakes,
the performance is pretty predictable and scales relative to the number
of vertices in the shape.
What is the area of this polygon? Ring Area
There are many ways to calculate the area of a polygon on the earth. The most
complex and most accurate is rarely used - taking into account the minute
surface variations and elevation differences of the geoid.
A much more common approach is to calculate the area of a ring using some simple
trigonometry popularized by Robert Chamberlain at NASA.
There are many implementations of that approach, like geojson-area
One neat side-effect of the algorithm to calculate polygon area is that it
also answers the question of the polygon’s winding number:
a clockwise loop will yield area greater than or equal to zero,
while a counterclockwise loop has negative area.
What are the closest points to this one? KNN
What about finding the closest points to a certain point? The old-fashioned
answer from my MySQL days would be something like a bounding box query
<: you can do better if you’re writing an application.
Using a k-d tree, we can solve
the nearest neighbor problem
a tested and awesome implementation of this by Dark Sky.
Many questions distill into these: for instance, which shapes interact
with a bounding box, a common question in tiled maps, is some combination
of indexing and point in polygon checking.
An interesting new approach that we’ve been exploring at Mapbox
and that has been implemented in Google’s S2 Geometry Library
and Morgan Herlocker’s tile-cover
library, is to use cell indexes rather than trees to answer these questions.
It’s a promising approach because it flattens the index structure, making
it possible to distribute the index across a network and express spatial queries
as range queries. Since the derivation of cells from a shape is very similar
to rasterization, we can steal tricks from graphics programming and use
ray-tracing techniques to make it super-fast.