how to plot a k-distance graph in python

Mauro Gentile picture Mauro Gentile · Apr 1, 2017 · Viewed 10.3k times · Source

How do I plot (in python) the distance graph for a given value of min-points in DBSCAN???

I am looking for the knee and corresponding epsilon value.

In the sklearn I do not see any method that return such distances.... Am I missing something?

Answer

Haritz Laboa picture Haritz Laboa · Apr 11, 2019

First you can define a function to calculate the distance of each point to its k-th nearest neighbor:

def calculate_kn_distance(X,k):

    kn_distance = []
    for i in range(len(X)):
        eucl_dist = []
        for j in range(len(X)):
            eucl_dist.append(
                math.sqrt(
                    ((X[i,0] - X[j,0]) ** 2) +
                    ((X[i,1] - X[j,1]) ** 2)))

        eucl_dist.sort()
        kn_distance.append(eucl_dist[k])

    return kn_distance

Then, once you have defined your function, you can choose a k value and plot the histogram to find a knee to define an appropriate epsilon value.

eps_dist = calculate_kn_distance(X[1],4)
plt.hist(eps_dist,bins=30)
plt.ylabel('n');
plt.xlabel('Epsilon distance');

enter image description here

In the example above, a vast majority of points lie within 0.12 units from their 4th nearest neighbor. So, a heuristic approach, could be choosing 0.12 as epsilon parameter.