250 likes | 355 Views
BST Data Structure. A BST node contains: A key (used to search) The data associated with that key Pointers to children, parent Leaf nodes have NULL pointers for children A BST contains A pointer to the root of the tree. BST Operations: Insert. BST property must be maintained
E N D
BST Data Structure • A BST node contains: • A key (used to search) • The data associated with that key • Pointers to children, parent • Leaf nodes have NULL pointers for children • A BST contains • A pointer to the root of the tree.
BST Operations: Insert • BST property must be maintained • Algorithm sketch: • To insert data with key k • Compare k to root.key • If k < root.key, go left • If k > root.key, go right • Repeat until you reach a leaf. That's where the new node should be inserted. • Note: keep track of prospective parent along the way.
BST Operations: Insert • Running time: • The new node is inserted at a leaf position, so this depends on the height of the tree. • Worst case: • Inserting keys 1,2,3,... in this order will result in a tree that looks like a chain: • Tree has degenerated to list • Height : linear • Note also that such a tree is worsethan a linked list since it takes upmore space (more pointers) 1 2 3
BST Operations: Insert • Running time: • The new node is inserted at a leaf position, so this depends on the height of the tree. • Best case • The top levels of the tree are filled up completely • The height is then lognwhere n is the numberof nodes in the tree. 12 4 14 2 8 16
BST Operations: Insert • The height of a complete (i.e. all levels filled up) BST with n nodes is logarithmic. Why? • Level i has 2i nodes, for i=0 (top level) through h (=height) • The total number of nodes, n, is then:n = 20+21+...+2h = (2h+1-1)/(2-1) = 2h+1-1Solving for h gives us h logn
BST Operations: Insert • Analysis conclusion • An insert operation consists of two parts: • Search for the position • best case logarithmic • worst case linear • Physically insert the node • constant
BST Operations: Insert • What if we allow duplicate keys? • Idea #1 : Always insert in the right subtree • Results in very unbalanced tree • Idea #2 : Insert in alternate subtrees • Makes it difficult to search for all occurrences • Idea #3 : All elements with the same key are inserted in a single node • Good idea! • Easy to search, does not affect balance any more than non-duplicate insertion.
BST Operations: Insert • What if we allow variable number of children? (n-ary tree) • Idea : Use a vector/list of pointers to children.
BST Operations: Search • Take advantage of the BST property. • Algorithm sketch: • Compare target to root • If equal, return success • If target < root, search left • If target > root, search right • Running time: • Similar to insert
BST Operations: Delete • The Delete operation consists of two parts: • Search for the node to be deleted • best case constant (deleting the root) • worst case linear • Delete the node • best case? • worst case?
BST Operations: Delete • CASE #1 • The node to be deleted is a leaf node. • Easy! • Physically remove the node. • Constant time • We are just resetting its parent's child pointer and deallocating memory
BST Operations: Delete • CASE #2 • The node to be deleted has exactly one child • Easy! • Physically remove the node. • Constant time • We are just resetting its parent's child pointer, its child's parent pointer and deallocating memory
BST Operations: Delete • CASE #3 • The node to be deleted has two children • Not so easy • If we physically delete the node, we'll have to place its two children somewhere. This seems to require too much tree restructuring. • But we know it's easy to delete a node that has at most one child. What if we find such a node whose contents can be copied over without violating the BST property and then physically delete that node?
BST Operations: Delete • CASE #3, continued • The node to be deleted, x, has two children • Idea: • Find the x's immediate successor, y. It is guaranteed to have at most one child • Copy the y's contents over to x • Physically delete y.
BST Operations: Delete • Finding the immediate successor: • We know that the node has two children. Due to the BST property, the immediate successor will be in the right subtree. • In particular, the immediate successor will be the smallest element in the right subtree. • The smallest element in a BST is always the leftmost leaf.
BST Operations: Delete • Finding the immediate successor: • Since it requires traveling down the tree from the current node to a leaf, it may take up to linear time in the worst case. • In the best case it will take logarithmic time. • The time to perform the copy and delete the successor is constant.
Binary Search Trees • Traversing a tree = visiting its nodes • Three major ways to traverse a binary tree: • preorder • visit root • visit left subtree • visit right subtree • postorder • visit left subtree • visit right subtree • visit root When applied on a BST, it visits the nodes in order from smaller to larger • inorder • visit left subtree • visit root • visit right subtree
Binary Search Trees void print_inorder(Node *subroot ) { if (subroot != NULL) { print_inorder(subroot left); cout << subrootdata; print_inorder(subroot right); } } How long does this take? There is exactly one call to print_inorder() for each node of the tree. There are n nodes, so the running time of this operation is(n)
Binary Search Trees • A tree may also be traversed one "level" at a time (top to bottom, left to right). This is usually called a level-order traversal. • It requires the use of a temporary queue: enqueue root while (queue is not empty) { get the front element, f print f enqueue f's children dequeue }
Binary Search Trees 12 4 14 2 8 16 6 10 in-order : 2 - 4 - 6 - 8 - 10 - 12 - 14 pre-order: 12 - 4 - 2 - 8 - 6 - 10 - 14 - 16 post-order: 2 - 6 - 10 - 8 - 4 - 16 - 14 - 12 level-order: 12 - 4 - 14 - 2 - 8 - 16 - 6 - 10
Binary Search Trees • Idea for sorting algorithm: • Given a sequence of integers, insert each one in a BST • Perform an inorder traversal. The elements will be accessed in sorted order. • Running time: • In the worst case, the tree will degenerate to a list. Creation will take quadratic time and traversal will be linear. Total: O(n2) • On average, the tree will be mostly balanced. Creation will take O(nlogn) and traversal will again be linear. Total: O(nlogn)
BSTs vs. Lists • Time • In the worst case, all dictionary operations are linear. • On average, BSTs are expected to do better. • Space • BSTs store an additional pointer per node. • The BST seemed like a good idea, but in the end it doesn't offer much improvement. • We must find a way to keep the tree balanced and guarantee logarithmic height.
Balanced Trees • There are several ways to define balance • Examples: • Force the subtrees of each node to have almost equal heights • Place upper and lower bounds on the heights of the subtrees of each node. • Force the subtrees of each node to have similar sizes (=number of nodes)