Using the following code to cluster geolocation coordinates results in 3 clusters:
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans2, whiten
coordinates= np.array([
[lat, long],
[lat, long],
...
[lat, long]
])
x, y = kmeans2(whiten(coordinates), 3, iter = 20)
plt.scatter(coordinates[:,0], coordinates[:,1], c=y);
plt.show()
Is it right to use Kmeans for location clustering, as it uses Euclidean distance and not Haversine formula as a distance function?
k-means is not a good algorithm to use for spatial clustering, for the reasons you meantioned. Instead, you could do this clustering job using scikit-learn's DBSCAN with the haversine metric and ball-tree algorithm.
This tutorial demonstrates clustering latitude-longitude spatial data with DBSCAN/haversine and avoids all those Euclidean-distance problems:
df = pd.read_csv('gps.csv')
coords = df.as_matrix(columns=['lat', 'lon'])
db = DBSCAN(eps=eps, min_samples=ms, algorithm='ball_tree', metric='haversine').fit(np.radians(coords))
Note that this specifically uses scikit-learn v0.15, as some earlier/later versions seem to require a full distance matrix to be computed. Also notice that the eps value is in radians and that .fit() takes the coordinates in radian units for the haversine metric.