Geo-Search (Distance) in PHP/MySQL (Performance)

Dexter picture Dexter · Mar 8, 2011 · Viewed 11.9k times · Source

I have a MySQL-table (MyISAM) containing about 200k entries of lat/long pairs that I select from, based on the pairs distance (great circle formula) from another lat/long pair. (e.g. all entries that are within a 10km radius around 50.281852, 2.504883)

My problem is that this query takes about 0,28 sec. to run just for those 200k entries (which continue to get more every day). While 0,28 sec. would be fine normally, this query runs very often as it powers the main feature of my web-app, and often times it's part of a larger query.

Is there any way to speed this up? Obviously MySQL has to run through all 200k entries every time and perform the great circle formula for every entry. I read something about geohashing, R-Trees and the like here on Stack Overflow but I don't think that's the way I want to go. Partly because I've never been a big fan of maths, but mostly because I think that this problem has already been solved by someone smarter than me in a library/extension/etc. that has been tested extensively and is being updated regularly.

MySQL seems to have a spatial extension but that one doesn't provide a distance function. Should I be looking at another database to put this coordinate-pairs in? PostgreSQL seems to have a fairly mature Spatial extension. Do you know anything about it? Or would PostgreSQL too simply just use the great circle formula to get all entries within a certain region?

Is there maybe a specialized stand-alone product or mysql-extension that already does what I'm looking for?

Or is there maybe A PHP library I could use to do the calculations? Using APC I could easily fit the lat-long pairs into memory (those 200k entries take about 5MB) and then run the query inside of PHP. The problem with this approach however is that then I'd have a MySQL query like SELECT .. FROM .. WHERE id in (id1, id2, ..) for all the results which can be up to a few thousand. How well does MySQL handle Queries like these? And then (since this is a number-crunching task) would doing this in PHP be fast enough?

Any other Ideas what I should/shouldn't do?

For completeness, here is the sample query, stripped of any irrelevant parts (as I said, usually this is part of a bigger query where I join multiple tables):

SELECT id,
       6371 * acos( sin( radians( 52.4042924 ) ) * sin( radians( lat ) ) + cos( radians( 50.281852 ) ) * cos( radians( lat ) ) * cos( radians( 2.504883 ) - radians( lon ) ) ) AS dst
FROM geoloc
HAVING dst <10
ORDER BY dst ASC

Answer

Patrick picture Patrick · Mar 8, 2011

What if you approach the problem from a different angle?

10 km in a straight line is:

  1. on the latitude is equal to ~1'(minute)
  2. on the longitude is equal to ~6'(minutes)

Using this as a basis, do some quick math and in your query add to the WHERE clause removing any locations that are outside the 'box' that is created by adding the buffer zone with the assumption of 1' lat & 6' long

gps buffer zone circle

Working from this image:

  1. GPS location you are searching for (34° 12' 34.0", -85° 1' 1.0") [34.2094444444, -85.0169444444]

  2. You find the min/max latitude/longitude

    2a. Min Latitude - 34.1927777778, -85.0169444444

    2b. Min Longitude - 34.2094444444, -85.1169444444

    2c. Max Latitude - 34.2261111111, -85.0169444444

    2d. Max Longitude - 34.2094444444, -84.9169444444

  3. Run your query with the min and max of each direction

    SELECT *
    
    FROM geoloc
    
    WHERE
    
    lat >= 34.1927777 AND
    
    lat <= 34.2261111 AND
    
    long >= -85.1169444 AND
    
    long <= -84.9169444;
    

You can either integrate the distance calculation with the SQL query or you can use a PHP library/class to run the distance check after pulling the data. Either way you have reduced the number of calculations by a large percentage.

I use the following function to calculate the distance between two US84 GPS locations. Two parameters are passed, each parameter is an array with the first element being the latitude and the second element being the longitude. I believe it has an accuracy to a few feet, which should be enough for all but the hardest core GPS-ophiles. Also, I believe this uses the Haversine distance formula.

$distance = calculateGPSDistance(array(34.32343, -86.342343), array(34.433223, -96.0032344));

function calculateGPSDistance($site1, $site2)
{
    $distance = 0;
    $earthMeanRadius = 2.0891 * pow(10, 7);

    $deltaLatitude = deg2rad($site2[0] - $site1[0]);
    $deltaLongitude = deg2rad($site2[1] - $site1[1]);
    $a = sin($deltaLatitude / 2) * sin($deltaLatitude / 2) + cos(deg2rad($site1[0])) * 
        cos(deg2rad($site2[0])) * sin($deltaLongitude / 2) * sin($deltaLongitude / 2);
    $c = 2 * atan2(sqrt($a), sqrt(1-$a));
    $distance = $earthMeanRadius * $c;

    return $distance;
}

UPDATE

I forgot to mention, my distance function will return distance in feet.