590 likes | 602 Views
Data Structures & Algorithms Binary Search Trees. Search. Long studied problem Intrinsic problem in much of CS Business applications Information retrieval Caching Matching problems. Search. Goal: given a key, find an item(s) that is associated with that key
E N D
Data Structures & Algorithms Binary Search Trees
Search Long studied problem Intrinsic problem in much of CS Business applications Information retrieval Caching Matching problems
Search Goal: given a key, find an item(s) that is associated with that key If search is successful, called a hit. If not, called a miss. May also extend to approximate search, range queries, etc.
Search Defn. 12.1: A symbol table is a data structure of items with keys that supports two basic operations: • insert an new item, and • return an item with a given key. Symbol table is otherwise known as: • Dictionary • Associative memory (contrast with retrieving the item stored at a particular address)
Search Examples of symbol tables: • Telephone book • Encyclopedia • Index (vs. table of contents) • Compiler tables of identifiers Huge advantage of computer-based symbol tables: • Easy to be dynamic
Search Symbol table ADT interface: • Insert <new item> • Search <key> • Remove <item> • Select <int> • Sort • Join <Symbol table> • May also want construct, destroy, • empty test, copy, search&insert
Search Symbol table ADT template <class Item, class Key> class ST { Private: // impl dependent stuff Public: ST(int); int count(); Item search(Key); void insert(Item); void remove(Item); Item select(int); void show(ostream&); }
Search Obligations imposed by symbol table ADT on Key: • key() • null() • operator== • operator< • Mods to rand, scan, show Also – missing join() Interpretations – e.g., remove()
Key-indexed Search Key values distinct, small numbers Can use an array implementation Simply put item with key k in st[k] To remove, replace with null item Time = ? • Constant Select, sort, count all linear • Skip over null items Client has to deal with duplicate keys
Key-indexed Search If there are no items, just keys Use table of bits • Existence table • Bit map Applications: • Allocation tables for memory • Blum filters
Key-indexed Search Prop 12.1: If key values are positive integers less than M and items have distinct keys, then the symbol-table data type can be implemented with key-indexed arrays of items such that insert, search, and remove require constant time, and initialize, select, and sort require time proportional to M, whenever any of the operations are performed on an N-item table.
Sequential Search If keys are over too large a range to used key-indexed approach, can still use a simple approach: Store contiguously in an array, either • In order • Unordered Trade-offs? • Insertion time • Search time • Removal time • Sort time
Sequential Search If keys are over too large a range to used key-indexed approach, can still use a simple approach: Could also store in linked list, either • In order • Unordered Trade-offs? • Aside from those depending on ordering, those from arrays vs. linked lists: insert/remove/join times vs. size
Sequential Search Prop 12.2: Sequential search in a symbol table with N items uses about N/2 comparisons for search hits (on average). True for any of the four forms (array or linked list, ordered or unordered).
Sequential Search Prop 12.2: Sequential search in a symbol table with N unordered items uses a constant number of steps for inserts and N comparisons for search misses (always). True for array or linked list.
Sequential Search Prop 12.3: Sequential search in a symbol table with N ordered items uses about N/2 comparisons for insertions, search hits, and search misses (on average). True for array or linked list.
Costs of Insertion and Search N = number of items, M = size of container
Costs of Insertion and Search N = number of items, M = size of container
Binary Search Binary search – compare to middle item in sorted search table, then search appropriate half. k? a c f j k m n p q r s t v x y a c f j k m n p q r s t v x y a c f j k m n p q r s t v x y a c f j k m n p q r s t v x y hit a c f j k m n p q r s t v x y
Binary Search Prop 12.4: Binary search in a symbol table with N ordered items never uses more than [lg N] + 1 comparisons for a search (hit or miss). Note that this is the same as the number of bits in the representation of N. Why?
Binary Search Keeping table sorted • Expensive using insertion sort • May not matter if table is mostly static, • or if the number of searches is huge (relative to insertions). Improvements: • Interpolation search (try to guess where key will fall in ST) • assumes key values are evenly distributed
Binary Search Trees Use an explicit tree structure Defn 12.2: A binary search tree (BST) is a binary tree that has a key associated with each internal node, and the key in a node is >= the keys in its left subtree and <= the keys in its right subtree
Binary Search Trees Defn 12.2: A binary search tree (BST) is a binary tree that has a key associated with each internal node, and the key in a node is >= the keys in its left subtree and <= the keys in its right subtree k <=k >=k
Binary Search Trees BST search algorithm sketch To search a BST T with root h for key k: If h is null (T is empty) – return miss If h.key == k – return hit If k < h.key search left subtree h.left else search right subtree h.right
Binary Search Trees BST insert algorithm To insert a new item X into a BST: Essentially search, then attach new node (and just as efficient as search): If tree empty set X as root and return If X.key < root.key insert X into left subtree Else insert X into right subtree
Binary Search Trees BST sort algorithm To sort a BST (i.e., output items stored in BST in order of keys): Inorder traversal of BST If tree empty return Show left subtree Show root Show right subtree
Binary Search Trees Prop. 12.6: Search hits require about 2 ln N ≈ 1.39 lg N comparisons on average, in a BST built from random keys. (Treating successive < and = operations as one comparison) Prf: Search hit follows a path from root to the node with the key; average is internal path length over all nodes, + 1.
Binary Search Trees Prop. 12.7: Insertions and search misses require about 2 ln N ≈ 1.39 lg N comparisons on average, in a BST built from random keys. (Treating successive < and = operations as one comparison) Prf: Search hit follows a path from root to an external node; average is external path length over all nodes.
Binary Search Trees Prop. 12.8: In the worst case, a search in a BST with N keys can take N comparisons. Prf: The tree may only have one non-empty subtree per internal node.
Index Implementations May not want to move actual items around, and may want flexibility in item contents. Here, may use items stored in an array (or parallel arrays), with key extraction, then use parallel arrays for link (i.e., have left[i] and right[i] give indexes for left and right child, resp.) Appl: large or overlapping items (e.g., text strings in corpus)
BST Insertion Usually, insertion into BST at bottom Replace external node What if need to insert at root, so new nodes are near the top? Examine this now, as it will come in handy soon…..
BST Insertion at Root Wlog, assume new item has a key larger than that of the current root. L H D M F N B J E O C I K
BST Insertion at Root Tempting to make new node root, make old root left child, and old right child the right child of new root: L H D M F N B J E O C I K
BST Insertion at Root But if there are items in right subtree with smaller keys, then this violates the BST requirements! L H D M F N B J E O C I K
BST Insertion at Root Must somehow rearrange tree to re-establish BST property… moving all nodes with smaller key to left subtree looks costly! L H D M F N B J E O C I K
BST Insertion at Root Solution: ROTATION L H D M F N B J E O C I K
BST Insertion at Root Solution: ROTATION (here in BST) A N L R >=N K M <=L >=L & <=N
BST Insertion at Root Right Rotation in BST: make left child new root, make old root right child A N L R >=N K M <=L >=L & <=N
BST Insertion at Root Right Rotation in BST: patch up links from A, to “middle” subtree, betw/L & N A L N R K M <=L >=L & <=N >=N
BST Insertion at Root Result is still BST, and requires a constant number of local changes. A L N R K M <=L >=L & <=N >=N
BST Insertion at Root Left rotation is mirror image of right rotation (reverse the Rrot steps!) A L N R K M <=L >=L & <=N >=N
BST Insertion at Root Right rotation is mirror image of left rotation (reverse the Lrot steps!) A N L R K M <=L >=L & <=N >=N
BST Insertion at Root Now we can insert at root by inserting at leaf, … L H D M F N B J E O C I K
BST Insertion at Root Now we can insert at root by inserting at leaf, then rotating up H D M F N B J E O C I K L
BST Insertion at Root Now we can insert at root by inserting at leaf, then rotating up, and up H D M F N B J L E O C I K
BST Insertion at Root Now we can insert at root by inserting at leaf, then rotating up, and up, and up H D M F N B L E O C J K I
BST Insertion at Root Now we can insert at root by inserting at leaf, then rotating up, and up, and up, and up H L D M J F B K I N E C O
BST Insertion at Root Q: How much does it cost? A: path length Q: Why do it? L H M J D N K I F B O E C
BST Insertion at Root Q: Why do it? A: Often do searches for more recent items – faster if near the root! L H M J D N K I F B O E C
Other BST Functions Select k – like quickSort version: Keep count field with size of subtree Recursively select in subtrees, adjusting k if right subtree L 13 H 9 M 3 J 3 D 5 N 2 K 1 I 1 F 2 B 2 O 1 E 1 C 1