410 likes | 551 Views
CS632 – Algorithm Professor: G. Gibson. Group Project B- Tree Student: Yongsheng Ma. B-Tree. Introduction Operations Complexities Applications Summary. B-Tree Properties. A m-way search way Root node may have as few as two children or none if the tree is empty Root may be a leaf
E N D
CS632 – Algorithm Professor: G. Gibson Group Project B-Tree Student: Yongsheng Ma
B-Tree • Introduction • Operations • Complexities • Applications • Summary
B-Tree Properties • A m-way search way • Root node may have as few as two children or none if the tree is empty • Root may be a leaf • Internal nodes have at least ceiling(m/2) and at most m non-null sub-trees
B-Tree Properties • All leaf nodes are at the same level; that is, the tree is perfectly balanced. • A leaf node has at least ceiling(m/2)-1 entries (keys) and at most m-1 entries (keys).
B-Tree Properties • “branching factor ” can be quite large. • Each node may have many children, from a handful to thousands. • The keys in each node is in non-decreasing order.
Operations • Searching a key • Inserting a key • Splitting a node • Deleting a node
Searching a key • Much like searching a binary tree. • Make a multi-way branching decision at each node • The nodes encountered form a path downward from the root.
Searching a key • The number of pages accessed is (h)=(logtn) , in which h is the height and n is the number of keys. • CPU time is O(th)=O(t logtn) . • Note • t is minimum degree for B-tree. • So each node has the maximum number of children as 2t and entries(keys) as 2t-1.
Searching a key M D H Q T X B C F G J K L N P R S V W Y Z
Creating a empty tree • We can assume there is no disk read. • Allocates one disk page to be used as a new node in O(1) time.
Splitting a node • A fundamental operation used during insertion • The median key moves up into its parent node, which must be non-full. • If it has no parent, then the tree grows in height by one
Splitting a node … … N W … … … … N S W … … P Q R S T U V P Q R T U V t=4
Splitting a node H A D F H L N P A D F L N P t=4
Inserting a key • Requiring • O(h) disk accesses. • CPU time O(th)=O(t logtn) .
Inserting a key • Splitting the root is the only way to increase the height of a B-tree. • Unlike a binary tree, a B-tree increases in height at the top instead of the bottom .
Inserting a key (a) initial tree G M P X A C D E J K N O R S T U V Y Z t=3
Inserting a key (b) B inserted G M P X A B C D E J K N O R S T U V Y Z t=3
Inserting a key (c) Q inserted G M P T X A B C D E J K N O Q R S U V Y Z t=3
Inserting a key (d) L inserted P G M T X A B C D E J K L N O Q R S U V Y Z t=3
Inserting a key (e) F inserted P C G M T X A B D E F J K L N O Q R S U V Y Z t=3
Deleting a key • is analogous to insertion but is a little more complicated. • Exists various cases of deleting keys from B-tree.
Deleting a key • Different conditions can affect different behaviors. • In practice, deletion operations are most often used to delete keys from leaves.
Deleting a key • When deleting a key from an internal node, however, the procedure makes a downward pass through the tree but may have to return to the node from which the key was deleted to replace the key with its predecessor or successor.
Deleting a key • Although this procedure seems complicated, it involves only O(h) disk operations for a B-tree with height h. • The CPU time required is O(th)=O(t logtn) .
Deleting a key (a) Initial tree P C G M T X A B D E F J K L N O Q R S U V Y Z t=3
Deleting a key (b) F deleted: case 1 P C G M T X A B D E J K L N O Q R S U V Y Z t=3
Deleting a key (c) M deleted: case 2a P C G L T X A B D E J K N O Q R S U V Y Z t=3
Deleting a key (d) G deleted: case 2c P C L T X A B D E J K N O Q R S U V Y Z t=3
Deleting a key (e) D deleted: case 3b C L P T X A B E J K N O Q R S U V Y Z t=3
Deleting a key (e’) tree shrinks in height C L P T X A B E J K N O Q R S U V Y Z t=3
Deleting a key (f) B deleted: case 3a E L P T X A C J K N O Q R S U V Y Z t=3
Complexities • A large Branching Factor reduces the number of disk accesses required to find a key. • When root node resides in memory, a tree with a height of 1 will require at most 2 disk accesses to find any key in the tree, this can be realized in Constant Time O(1).
Complexities • Running Time is comprised of the number of disk accesses and the CPU time. • During a disk Read or Write, an entire page of information is accessed • The number of disk accesses is measured in terms of pages that have to be read from or written to the disk.
Complexities • The number of disk pages accessed is O(h)=O(logtn). • The CPU time to traverse within each node is O(t). • The Total Time is O(th) which is equal to O(tlogtn) or ≈ O(log n). • It is the same for every basic operation.
Applications • Databases cannot typically be maintained entirely in memory. • Secondary storage is usually used. • B-tree is often used to index the data and to provide fast access.
Applications • Searching an un-indexed and unsorted database containing n key values will have a worst case running time of O(n) • Indexed with a B-tree, the same search operation will run in O(log n)
Applications – an example • To perform a search for a single key on a set of one million keys (1,000,000), a linear search will require at most 1,000,000 comparisons. • If the same data is indexed with a B-tree of minimum order 10 and height 9, 81 comparisons will be required in the worst case.
Summary • B-Tree is a balanced, multi-way file organization. • Search, Insert, and Delete operations retain desirable logarithmic costs. • B-Tree schemes promote 50% storage usage.
Extra • B-tree variants • B+ and B* tree • Branching factors are improved
Extra • B+ tree • Combine features of ISAM and B tree • Contain Index pages and Data pages • Data pages always appear as leaf nodes • Root and intermediate nodes are index pages
Extra • B+ tree • Saves more space (but who cares) • Non-leaf and leaf nodes contain different numbers of nodes • Deletion more complicated • Faster look up for B-trees because the height of the tree is smaller (because items are stored more compactly)