C/C++: How to store data in a file in B tree

user203405 picture user203405 · Feb 2, 2010 · Viewed 9.1k times · Source

It appears to me that one way of storing data in a B-tree as a file can be done efficiently with C using binary file with a sequence (array) of structs, with each struct representing a node. One can thus connect the individual nodes with approach that will be similar to creating linked lists using arrays. But then the problem that props up would be deletion of a node, as erasing only a few bytes in the middle in a huge file is not possible.

One way of deleting could be to keep track of 'empty' nodes until a threshold cutoff is reached and then make another file that will discard the empty nodes. But this is tedious.

Is there a better approach from the simplicity/efficiency point of view for deleting, or even representing a B-tree in a file?

TIA, -Sviiya

Answer

Thomas Matthews picture Thomas Matthews · Feb 3, 2010

For implementing B-Trees in a file, you can use the file offset instead of pointers. Also, you can implement a "file memory manager", so that you can re-use deleted items in the file.

In order to fully recover the deleted blocks in a B-Tree file, you will have to recreate the B-Tree in a new file. Also remember the most OSes have no methods for truncating files. A portable method for truncating a file is to write a new file and destroy the old.

Another suggestion is to partition the file into B-Tree partition and data (item) partition. A B-Tree partition will contain the pages. The leaf pages will contain offsets to the data items. The data partition will be a section in the file containing data items. You may end up creating more than one of each partition and the partitions may be interleaved.

I spent much time playing with a file based B-Tree, until I gave up and decided to let a database program (or server) handle the data for me.