Clustering Lat/Longs in a Database

Pure.Krome picture Pure.Krome · Dec 1, 2008 · Viewed 11.2k times · Source

I'm trying to see if anyone knows how to cluster some Lat/Long results, using a database, to reduce the number of results sent over the wire to the application.

There are a number of resources about how to cluster, either on the client side OR in the server (application) side .. but not in the database side :(

This is a similar question, asked by a fellow S.O. member. The solutions are server side based (ie. C# code behind).

Has anyone had any luck or experience with solving this, but in a database? Are there any database guru's out there who are after a hawt and sexy DB challenge?

please help :)

EDIT 1: Clarification - by clustering, i'm hoping to group x number of points into a single point, for an area. So, if i say cluster everything in a 1 mile / 1 km square, then all the results in that 'square' are GROUP'D into a single result (say ... the middle of the square).

EDIT 2: I'm using MS Sql 2008, but i'm open to hearing if there are other solutions in other DB's.

Answer

Drew Hall picture Drew Hall · Dec 1, 2008

I'd probably use a modified* version of k-means clustering using the cartesian (e.g. WGS-84 ECF) coordinates for your points. It's easy to implement & converges quickly, and adapts to your data no matter what it looks like. Plus, you can pick k to suit your bandwidth requirements, and each cluster will have the same number of associated points (mod k).

I'd make a table of cluster centroids, and add a field to the original data table to indicate what cluster it belonged too. You'd obviously want to update the clustering periodically if your data is at all dynamic. I don't know if you could do that with a stored procedure & trigger, but perhaps.

*The "modification" would be to adjust the length of the computed centroid vectors so they'd be on the surface of the earth. Otherwise you'd end up with a bunch of points with negative altitude (when converted back to LLH).