I have a doubt about R-Tree data structure. What is fan-out in R-Tree. Is it a Maximum number of entries?
How can we determine the minimum and maximum number of entries in R-Tree? Let say if i have 10000 points and my page size i 8kb.
Thanks
Fan-out, in any tree, is number of pointers to child nodes in a node.
Different trees have different fan-out.
A binary tree has fanout 2.
A B-tree has a fan-out B, with all nodes except leaves having between B/2 and B children. External (on-disk) implementation often relax the minimal number of children restriction to save some updates.
In databases, B-trees or their variant called B+-trees is often used so that each node has size of 1 page and the fan out determined by number of sort keys and pointers that fit in that space.
An R-tree is a search tree where indices are multi-dimensional intervals. These may possibly overlap. It may have any fan-out. Usual is number of 2 to number of dimensions (so 4 for 2-dimensional, 8 for 3-dimensional etc.). But it may have higher fanout too and organizing it similar to B-tree is certainly possible.
How can we determine the minimum and maximum number of entries in R-Tree? Let say if I have 10000 points and my page size is 8KiB.
The size of the tree node does not have to match page size. If it does (usually used for external, i.e. on disk, implementations), you still need to know how large the sort key is and how large the pointer is. An R-tree needs 2 coordinate values, minimum and maximum, per dimension. So a 2-dimensional R-tree with double precision coordinates (the common case appearing in mapping applications) will have four 64 bit values describing the rectangle plus a child pointer, for which an external implementation probably wants to use 64 bits as well. That is 20 B per child and you can squeeze 409 of these in an 8 KiB page. The number of points does not matter. Dimension and precision of coordinate system does.
In memory, trees with low fanout are more efficient, because though they are deeper, they need fewer comparisons per search. However on disk (in databases) the slow operation is reading and since that can only be done in blocks, it is faster to reduce number of nodes by having each node fill whole block and have correspondingly higher fanout.