My current code is pretty quick, but I need to make it even faster so we can accommodate even more markers. Any suggestions?
Notes:
The code:
$singleMarkers = array();
$clusterMarkers = array();
while (count($markers)) {
$marker = array_pop($markers);
$cluster = array();
// Compare marker against all remaining markers.
foreach ($markers as $key => $compareMarker) {
// This function returns the distance between two markers, at a defined zoom level.
$pixels = pixelDistance($marker['lat'], $marker['lng'], $compareMarker['lat'], $compareMarker['lng'], $zoomLevel);
// If two markers are closer than defined distance, remove compareMarker from array and add to cluster.
if ($pixels < $distance) {
unset($markers[$key]);
$cluster[] = $compareMarker;
}
}
// If a marker was added to cluster, also add the marker we were comparing to.
if (count($cluster) > 0) {
$cluster[] = $marker;
$clusterMarkers[] = $cluster;
} else {
$singleMarkers[] = $marker;
}
}
function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
$x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
$y1 = $lat1*10000000;
$x2 = $lon2*10000000;
$y2 = $lat2*10000000;
return sqrt(pow(($x1-$x2),2) + pow(($y1-$y2),2)) >> (21 - $zoom); //21 is the max zoom level
}
UPDATE
Here's the current code:
$singleMarkers = array();
$clusterMarkers = array();
// Minimum distance between markers to be included in a cluster, at diff. zoom levels
$DISTANCE = (10000000 >> $ZOOM) / 100000;
// Loop until all markers have been compared.
while (count($markers)) {
$marker = array_pop($markers);
$cluster = array();
// Compare against all markers which are left.
foreach ($markers as $key => $target) {
$pixels = abs($marker['lat']-$target['lat']) + abs($marker['lng']-$target['lng']);
// If the two markers are closer than given distance remove target marker from array and add it to cluster.
if ($pixels < $DISTANCE) {
unset($markers[$key]);
$cluster[] = $target;
}
}
// If a marker has been added to cluster, add also the one we were comparing to.
if (count($cluster) > 0) {
$cluster[] = $marker;
$clusterMarkers[] = $cluster;
} else {
$singleMarkers[] = $marker;
}
}
Do you actually need to calculate the Euclidean distance? If you are just comparing relative magnitudes of distances, you can probably get away with using the Manhattan distance, which will save you two calls to pow()
and one to sqrt()
:
function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
$x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
$y1 = $lat1*10000000;
$x2 = $lon2*10000000;
$y2 = $lat2*10000000;
return ($x1-$x2) + ($y1-$y2) >> (21 - $zoom);
}
Not sure if you need the >> (21 - $zoom)
bit for your calculations, so I left it in. But unless you actually need to use the calculated distance values elsewhere, you can probably get away with just using the latitude/longitude directly (no need to multiply by anything) and taking the Manhattan distance, assuming you pre-calculate $distance
to fit in with that measure, which will be a lot cheaper in computational terms than coercing all the distances to fit in with the units and magnitude of $distance
.
EDIT: When I was researching this problem last year, I found some useful stuff on Wikipedia - yes, it can happen ;-)
I can also highly recommend the book Programming Collective Intelligence: Building Smart Web 2.0 Applications which goes into clustering in great depth, as applied not only to geographical data but also to other areas of data analysis.