I am trying to implement Disjoint Sets for use in Kruskal's algorithm, but I am having trouble understanding exactly how it should be done and in particular, how to manage the forest of trees. After reading the Wikipedia description of Disjoint Sets and after reading the description in Introduction to Algorithms (Cormen et al) I have come up with the following:
class DisjointSet
{
public:
class Node
{
public:
int data;
int rank;
Node* parent;
Node() : data(0),
rank(0),
parent(this) { } // the constructor does MakeSet
};
Node* find(Node*);
Node* merge(Node*, Node*); // Union
};
DisjointSet::Node* DisjointSet::find(DisjointSet::Node* n)
{
if (n != n->parent) {
n->parent = find(n->parent);
}
return n->parent;
}
DisjointSet::Node* DisjointSet::merge(DisjointSet::Node* x,
DisjointSet::Node* y)
{
x = find(x);
y = find(y);
if (x->rank > y->rank) {
y->parent = x;
} else {
x->parent = y;
if (x->rank == y->rank) {
++(y->rank);
}
}
}
I am pretty sure this is incomplete though and that I am missing something.
Introduction to Algorithms mentions that there should be a forest of trees, but it does not give any explanation for a practical implementation of this forest. I watched CS 61B Lecture 31: Disjoint Sets ( http://www.youtube.com/watch?v=wSPAjGfDl7Q ) and here the lecturer uses only an array to store both the forest and all its trees and values. There is no explicit 'Node' type of class as I have, mentioned. I have also found many other sources (I cannot post more than one link), which also use this technique. I would be happy to do this, except that this relies on the indices of the array for lookup and since I want to store values of type other than int, I need to use something else (std::map comes to mind).
Another issue that I am unsure about is memory allocation because I am using C++. I am storing trees of pointers and my MakeSet operation will be: new DisjointSet::Node; . Now, these Nodes only have pointers to their parents, so I'm not sure how to find the bottom of a tree. How will I be able to traverse my trees to deallocate them all?
I understand the basic concept of this data structure, but I'm just a bit confused about the implementation. Any advice and suggestions would be most welcome, thank you.
Not a perfect implementation by any means (I did write it after all!), but does this help?
/***
* millipede: DisjointSetForest.h
* Copyright Stuart Golodetz, 2009. All rights reserved.
***/
#ifndef H_MILLIPEDE_DISJOINTSETFOREST
#define H_MILLIPEDE_DISJOINTSETFOREST
#include <map>
#include <common/exceptions/Exception.h>
#include <common/io/util/OSSWrapper.h>
#include <common/util/NullType.h>
namespace mp {
/**
@brief A disjoint set forest is a fairly standard data structure used to represent the partition of
a set of elements into disjoint sets in such a way that common operations such as merging two
sets together are computationally efficient.
This implementation uses the well-known union-by-rank and path compression optimizations, which together
yield an amortised complexity for key operations of O(a(n)), where a is the (extremely slow-growing)
inverse of the Ackermann function.
The implementation also allows clients to attach arbitrary data to each element, which can be useful for
some algorithms.
@tparam T The type of data to attach to each element (arbitrary)
*/
template <typename T = NullType>
class DisjointSetForest
{
//#################### NESTED CLASSES ####################
private:
struct Element
{
T m_value;
int m_parent;
int m_rank;
Element(const T& value, int parent)
: m_value(value), m_parent(parent), m_rank(0)
{}
};
//#################### PRIVATE VARIABLES ####################
private:
mutable std::map<int,Element> m_elements;
int m_setCount;
//#################### CONSTRUCTORS ####################
public:
/**
@brief Constructs an empty disjoint set forest.
*/
DisjointSetForest()
: m_setCount(0)
{}
/**
@brief Constructs a disjoint set forest from an initial set of elements and their associated values.
@param[in] initialElements A map from the initial elements to their associated values
*/
explicit DisjointSetForest(const std::map<int,T>& initialElements)
: m_setCount(0)
{
add_elements(initialElements);
}
//#################### PUBLIC METHODS ####################
public:
/**
@brief Adds a single element x (and its associated value) to the disjoint set forest.
@param[in] x The index of the element
@param[in] value The value to initially associate with the element
@pre
- x must not already be in the disjoint set forest
*/
void add_element(int x, const T& value = T())
{
m_elements.insert(std::make_pair(x, Element(value, x)));
++m_setCount;
}
/**
@brief Adds multiple elements (and their associated values) to the disjoint set forest.
@param[in] elements A map from the elements to add to their associated values
@pre
- None of the elements to be added must already be in the disjoint set forest
*/
void add_elements(const std::map<int,T>& elements)
{
for(typename std::map<int,T>::const_iterator it=elements.begin(), iend=elements.end(); it!=iend; ++it)
{
m_elements.insert(std::make_pair(it->first, Element(it->second, it->first)));
}
m_setCount += elements.size();
}
/**
@brief Returns the number of elements in the disjoint set forest.
@return As described
*/
int element_count() const
{
return static_cast<int>(m_elements.size());
}
/**
@brief Finds the index of the root element of the tree containing x in the disjoint set forest.
@param[in] x The element whose set to determine
@pre
- x must be an element in the disjoint set forest
@throw Exception
- If the precondition is violated
@return As described
*/
int find_set(int x) const
{
Element& element = get_element(x);
int& parent = element.m_parent;
if(parent != x)
{
parent = find_set(parent);
}
return parent;
}
/**
@brief Returns the current number of disjoint sets in the forest (i.e. the current number of trees).
@return As described
*/
int set_count() const
{
return m_setCount;
}
/**
@brief Merges the disjoint sets containing elements x and y.
If both elements are already in the same disjoint set, this is a no-op.
@param[in] x The first element
@param[in] y The second element
@pre
- Both x and y must be elements in the disjoint set forest
@throw Exception
- If the precondition is violated
*/
void union_sets(int x, int y)
{
int setX = find_set(x);
int setY = find_set(y);
if(setX != setY) link(setX, setY);
}
/**
@brief Returns the value associated with element x.
@param[in] x The element whose value to return
@pre
- x must be an element in the disjoint set forest
@throw Exception
- If the precondition is violated
@return As described
*/
T& value_of(int x)
{
return get_element(x).m_value;
}
/**
@brief Returns the value associated with element x.
@param[in] x The element whose value to return
@pre
- x must be an element in the disjoint set forest
@throw Exception
- If the precondition is violated
@return As described
*/
const T& value_of(int x) const
{
return get_element(x).m_value;
}
//#################### PRIVATE METHODS ####################
private:
Element& get_element(int x) const
{
typename std::map<int,Element>::iterator it = m_elements.find(x);
if(it != m_elements.end()) return it->second;
else throw Exception(OSSWrapper() << "No such element: " << x);
}
void link(int x, int y)
{
Element& elementX = get_element(x);
Element& elementY = get_element(y);
int& rankX = elementX.m_rank;
int& rankY = elementY.m_rank;
if(rankX > rankY)
{
elementY.m_parent = x;
}
else
{
elementX.m_parent = y;
if(rankX == rankY) ++rankY;
}
--m_setCount;
}
};
}
#endif