For many problems I see the solution recommended is to use a union-find data structure. I tried to read about it and think about how it is implemented (using C++). My current understanding is that it is nothing but a list of sets. So to find which set an element belongs we require n*log n
operations. And when we have to perform union, then we have to find the two sets which needs to be merged and do a set_union
on them. This doesn't look terribly efficient to me. Is my understanding of this data structure correct or am I missing something?
This is quite late reply, but this has probably not been answered elsewhere on stackoverflow, and since this is top most page for someone searching for union-find, here is the detailed solution.
Find-Union is a very fast operation, performing in near constant time. It follows Jeremie's insights of path compression, and tracking set sizes. Path compression is performed on each find operation itself, thereby taking amortized lg*(n) time. lg* is like the inverse Ackerman function, growing so very slow that it is rarely beyond 5 (at least till n< 2^65535). Union/Merge sets is performed lazy, by just pointing 1 root to another, specifically smaller set's root to larger set's root, which is completed in constant time.
Refer the below code from https://github.com/kartikkukreja/blog-codes/blob/master/src/Union%20Find%20%28Disjoint%20Set%29%20Data%20Structure.cpp
class UF {
int *id, cnt, *sz;
public:
// Create an empty union find data structure with N isolated sets.
UF(int N) {
cnt = N; id = new int[N]; sz = new int[N];
for (int i = 0; i<N; i++) id[i] = i, sz[i] = 1; }
~UF() { delete[] id; delete[] sz; }
// Return the id of component corresponding to object p.
int find(int p) {
int root = p;
while (root != id[root]) root = id[root];
while (p != root) { int newp = id[p]; id[p] = root; p = newp; }
return root;
}
// Replace sets containing x and y with their union.
void merge(int x, int y) {
int i = find(x); int j = find(y); if (i == j) return;
// make smaller root point to larger one
if (sz[i] < sz[j]) { id[i] = j, sz[j] += sz[i]; }
else { id[j] = i, sz[i] += sz[j]; }
cnt--;
}
// Are objects x and y in the same set?
bool connected(int x, int y) { return find(x) == find(y); }
// Return the number of disjoint sets.
int count() { return cnt; }
};