The fastest minimum spanning tree algorithm

toto picture toto · Feb 7, 2011 · Viewed 8.5k times · Source

http://en.wikipedia.org/wiki/Minimum_spanning_tree

I'm looking to benchmark my minimum spanning tree algorithm against the best of the best. Does someone know where I can find a C++ implementation of these algorithms? I binged and googled the and didn't find anything. If these algorithms are the best, surely there must be a C++ implementation somewhere?

The fastest minimum spanning tree algorithm to date was developed by David Karger, Philip Klein, and Robert Tarjan, who found a linear time randomized algorithm based on a combination of Borůvka's algorithm and the reverse-delete algorithm.[2][3] The fastest non-randomized algorithm, by Bernard Chazelle, is based on the soft heap, an approximate priority queue.[4][5] Its running time is O(m α(m,n)), where m is the number of edges, n is the number of vertices and α is the classical functional inverse of the Ackermann function. The function α grows extremely slowly, so that for all practical purposes it may be considered a constant no greater than 4; thus Chazelle's algorithm takes very close to linear time. If the edge weights are integers with a bounded bit length, then deterministic algorithms are known with linear running time.[6] Whether there exists a deterministic algorithm with linear running time for general weights is still an open question. However, Seth Petie and Vijaya Ramachandran have found a provably optimal deterministic minimum spanning tree algorithm, the computational complexity of which is unknown.[7]

I already test against Boost C++'s graph algorithms.

Answer

Edward Loper picture Edward Loper · Feb 7, 2011

When the Wikipedia page says "the fastest minimum spanning tree algorithm," what they mean is the algorithm with the lowest asymptotic bounds -- in this case, O(m α(m,n)), with m, n, and α defined as in the quote you pasted. Put simply, this means that for very large input values, the amount of time taken will always be bounded by C*m*α(m,n), for some value of C. But note that C might be very large -- i.e., this algorithm might be slower than others when applied to smaller input values, even though it's faster than others for very large input values.

When comparing the asymptotic bounds of two algorithms, there's no "testing" to see which is faster -- you just prove the asymptotic bounds of each algorithm, and see which one is lower. ("Asymptotic" refers to the behavior as the input size goes to infinity -- and it's hard to run tests with infinite-sized input values.)

But note that there are cases where you should not use the asymptotically fastest algorithm. If the "C" is very large, then you might be better off using a simpler algorithm for smaller data values.

My guess is that your algorithm may improve on the C, but not on the asymptotic bounds. But if I'm wrong on that, then you should submit it to ACM!

For more info, see: http://en.wikipedia.org/wiki/Big_O_notation