How can a B-tree node be represented?

chronodekar picture chronodekar · Feb 3, 2012 · Viewed 11k times · Source

We're learning B-trees in class and have been asked to implement them in code. The teacher has left choice of programming language to us and I want to try and do it in C#. My problem is that the following structure is illegal in C#,

unsafe struct BtreeNode
        {
            int key_num;        // The number of keys in a node
            int[] key;          // Array of keys
            bool leaf;          // Is it a leaf node or not?
            BtreeNode*[] c;     // Pointers to next nodes
        }

Specifically, one is not allowed to create a pointer to point to the structure itself. Is there some work-around or alternate approach I could use? I'm fairly certain that there MUST be a way to do this within the managed code, but I can't figure it out.

EDIT: Eric's answer pointed me in the right direction. Here's what I ended up using,

class BtreeNode
{
        public List<BtreeNode> children;       // The child nodes
        public static int MinDeg;               // The Minimum Degree of the tree
        public bool IsLeaf { get; set; }        // Is the current node a leaf or not?
        public List<int> key;                   // The list of keys 
...
}

Answer

Eric Lippert picture Eric Lippert · Feb 3, 2012

Coincidentally I actually just did implement a btree in C#, for a personal project. It was fun. I built a btree of lexicographically ordered variable size (up to 64 byte) keys which presented a number of challenges, particularly around figuring out when a page of storage was too full or too empty.

My advice, having just done that, is to build an abstraction layer that captures just the btree algorithms in their most abstract form, as an abstract base class. Once I got all the btree rules captured in that form, I specialized the base class in several different ways: as a regular fixed-key-size 2-3 btree, as one of my fancy variable-size-key btrees, and so on.

To start with, under no circumstances should you be doing this with pointers. Unsafe code is seldom necessary and never easy. Only the most advanced C# programmers should be turning off the safety system; when you do that, you are taking responsibility for the type and memory safety of the program. If you're not willing to do that, leave the safety system turned on.

Second, there's no reason to make this a struct. Structs are copied by value in C#; a btree node is not a value.

Third, you don't need to keep the number of keys in a node; the array of keys knows how many keys are in it.

Fourth, I would use a List<T> rather than an array; they are more flexible.

Fifth, you need to decide whether the key lives in the node or in the parent. Either way can work; my preference is for the key to live in the node, because I see the key as being associated with the node.

Sixth, it is helpful to know whether a btree node is the root or not; you might consider having two bools, one "is this a leaf?" and one "is this the root?" Of course a btree with a single item in it has a single node that is both leaf and root.

Seventh, you are probably going to build this thing to be mutable; normally one does not make public mutable fields on a C# class. You might consider making them properties. Also, the list of children can be grown and shrunk, but its identity does not change, so make it referentially read-only:

So I would probably structure my basic node as:

class Node
{
    public int Key { get; set; }
    public bool IsRoot { get; set; }
    public bool IsLeaf { get; set; }
    private List<Node> children = new List<Node>();
    public List<Node> Children { get { return this.children; } }
}

Make sense?