110 likes | 134 Views
B + -Tree Index Files. Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes used to look up records in a file. An index file consists of records (called index entries ) of the form
E N D
B+-Tree Index Files • Indexing mechanisms used to speed up access to desired data. • E.g., author catalog in library • Search Key - attribute to set of attributes used to look up records in a file. • An index fileconsists of records (called index entries) of the form • Index files are typically much smaller than the original file • Ordered indices:search keys are stored in sorted order pointer search-key
B+-Tree Index Files (Cont.) • All paths from root to leaf are of the same length • Each node that is not a root or a leaf has between [n/2] and n children. • A leaf node has between [(n–1)/2] and n–1 values • Special cases: • If the root is not a leaf, it has at least 2 children. • If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n–1) values. A B+-tree is a rooted tree satisfying the following properties:
B+-Tree Node Structure • Typical node • Ki are the search-key values • Pi are pointers to children (for non-leaf nodes) or pointers to records (for leaf nodes). • The search-keys in a node are ordered K1 < K2 < K3 < . . .< Kn–1
Leaf Nodes in B+-Trees Properties of a leaf node: • For i = 1, 2, . . ., n–1, pointer Pi points to a file record with search-key value Ki. • If Li, Lj are leaf nodes and i < j, Li’s search-key values are less than Lj’s search-key values • Pn points to next leaf node in search-key order
Non-Leaf Nodes in B+-Trees • Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with m pointers: • All the search-keys in the subtree to which P1 points are less than K1 • For 2 i n – 1, all the search-keys in the subtree to which Pi points have values greater than or equal to Ki–1 and less than Km–1
Example of a B+-tree B+-tree for account file (n = 3)
Example of B+-tree • Leaf nodes must have between 2 and 4 values ((n–1)/2 and n –1, with n = 5). • Non-leaf nodes other than root must have between 3 and 5 children ((n/2 and n with n =5). • Root must have at least 2 children. B+-tree for account file (n - 5)
Queries on B+-Trees • Find all records with a search-key value of k. • Start with the root node • Examine the node for the smallest search-key value > k. • If such a value exists, assume it is Kj. Then follow Pi to the child node • Otherwise k Km–1, where there are m pointers in the node. Then follow Pm to the child node. • If the node reached by following the pointer above is not a leaf node, repeat the above procedure on the node, and follow the corresponding pointer. • Eventually reach a leaf node. If for some i, key Ki = k follow pointer Pito the desired record. Else no record with search-key value k exists.
Queries on B+-Trees (Cont.) • In processing a query, a path is traversed in the tree from the root to some leaf node. • If there are K search-key values in the file, the path is no longer than logn/2(K). • A node is generally the same size as a disk block, typically 4 kilobytes, and n is typically around 100 (40 bytes per index entry). • With 1 million search key values and n = 100, at most log50(1,000,000) = 4 nodes are accessed in a lookup. • Contrast this with a balanced binary free with 1 million search key values — around 20 nodes are accessed in a lookup • above difference is significant since every node access may need a disk I/O, costing around 20 milliseconds!