390 likes | 497 Views
Search Structures. Overview. Trie Balanced BST AVL Red Black Tree. C 1. C 2. C 3. C 3. C 4. Trie. Special case of tree Applicable when Key C can be decomposed into a sequence of subkeys C 1 ,C 2 , … C n Redundancy exists between subkeys Approach Store subkey at each node
E N D
Overview • Trie • Balanced BST • AVL • Red Black Tree
C1 C2 C3 C3 C4 Trie • Special case of tree • Applicable when • Key C can be decomposed into a sequence of subkeys C1,C2, … Cn • Redundancy exists between subkeys • Approach • Store subkey at each node • Path through trie yields full key
A R S E T “ART” • Useful for searching strings • String decomposes into sequence of letters • Example • “ART” “A” “R” “T” • Can be very fast • Less overhead than hashing • May reduce memory • Exploiting redundancy • May require more memory • Explicitly storing substrings
Types of Trie • Standard • Single character per node • Compressed • Eliminating chains of nodes • Compact • Stores indices into original string(s) • Suffix • Stores all suffixes of string
Standard Trie • Approach • Each node (except root) is labeled with a character • Children of node are ordered (alphabetically) • Paths from root to leaves yield all input strings
Standard Trie Example • For strings • { a, an, and, any, at }
Standard Trie Example • For strings • { bear, bell, bid, bull, buy, sell, stock, stop }
Standard Tries • Node structure • Value between 1…m • Reference to m children • Array or linked list • Example Class Node { Letter value; // Letter V = { V1, V2, … Vm } Node child[ m ]; }
Standard Tries • Efficiency • Uses O(n) space • Supports search / insert / delete in O(dm) time • For • n total size of strings indexed by trie • d length of the parameter string • m size of the alphabet
Insert words into trie Each leaf stores occurrences of word in the text
AVL • An AVL Tree is a binary search tree such that: for every internal node v, the heights of the children of v can differ by at most 1. An example of an AVL tree where the heights are shown next to the nodes.
n(2) 3 n(1) 4 Height of an AVL Tree • Property: The height of an AVL tree storing n keys is O(log n). • Proof: Let us bound n(h): the minimum number of internal nodes of an AVL tree of height h. • n(1) = 1 and n(2) = 2 • For h > 2, an AVL tree of height h contains at least a root node, one AVL subtree of height h-1, and one AVL subtree of height h-2, so n(h) = 1 + n(h-1) + n(h-2) • Since n(h-1) > n(h-2), we have n(h) > 2n(h-2), and so n(h) > 2n(h-2), n(h) > 4n(h-4), n(h) > 8n(h-6), … (by induction), n(h) > 2in(h-2i) • Solving the base case we get n(h) > 2 h/2-1h < 2log n(h) +2 • Thus the height of an AVL tree is O(log n)
44 17 78 44 32 50 88 17 78 48 62 32 50 88 54 48 62 Insertion in an AVL Tree • Insertion is as in a binary search tree: always done by expanding an external node imbalance Example: insert 54 before insertion after insertion
Let w be the inserted node, z be the first unbalanced ancestor of w, y be the child of z with higher height (must be an ancestor of w), x be the child of y with higher height (must be an ancestor of w; can be equal to x). 44 17 78 32 50 88 48 62 54 Imbalance after Insertion z y x w
a=z T0 b=y c=x T1 b=y T2 T3 a=z c=x T2 T3 T1 T0 Trinode Restructuring • Assign names a, b, c to nodes x, y, z according to inorder traversal. • Perform the rotations needed to make b the topmost node of the three. case 1: single rotation
Trinode Restructuring symmetric case c=z b=y T3 b=y a=x T2 a=x c=z T0 T1 T2 T3 T1 T0
a=z T0 c=y b=x T3 b=x T1 T2 a=z c=y T3 T0 T2 T1 Trinode Restructuring case 2: double rotation
c=z T3 a=y b=x T0 b=x T2 T1 a=y c=z T3 T0 T2 T1 Trinode Restructuring symmetric case
T T 1 1 Insertion Example unbalanced... 4 44 x 3 2 17 62 z y 2 1 2 78 50 32 1 1 1 54 88 ...balanced 48 T 2 T 0 T 3
44 17 62 32 50 78 88 48 54 Removal in an AVL Tree • Removal begins as in a binary search tree, which means the node removed will become an empty external node. Its parent w may cause an imbalance. 44 17 62 w 50 78 88 48 54 Example: delete 32
Imbalance after Removal • Let w be the parent of the removed node, • z be the first unbalanced ancestor of w, • y be the child of z with higher height (is now not an ancestor of w), • x be • the child of y with higher height, if heights are different, or • the child of y on the same side as y, if heights are equal. z 44 w 17 62 y 50 78 x 88 48 54
Rebalancing after a Removal • Assign names a, b, c to nodes x, y, z according to inorder traversal. • Perform rotations to make b the topmost of the three. • As this restructuring may upset the balance of another node higher in the tree, we must continue checking for balance until the root of T is reached. 62 a=z 44 44 78 17 62 w b=y 17 50 88 50 78 c=x 48 54 88 48 54
Repeated Rebalancing 44 44 17 78 17 78 w=z 14 32 50 88 32 50 88 y 25 48 62 93 25 48 62 93 x 54 54
Repeated Rebalancing 44 25 78 32 50 88 17 25 48 62 93 54
Running Times for AVL Trees • Finding a value takes O(log n) • height of tree is O(log n), no restructures needed • Insertion takes O(log n) • initial find is O(log n) • single restructuring up the tree, maintaining heights is O(log n) • Removal takes O(log n) • initial find is O(log n) • (repeated) restructuring up the tree, maintaining heights is O(log n)
Red-Black Trees • Red-black trees: • Binary search trees augmented with node color • Operations designed to guarantee that the heighth = O(lg n) • First: describe the properties of red-black trees • Then: prove that these guarantee h = O(lg n) • Finally: describe operations on red-black trees
Red-Black Properties • The red-black properties: 1. Every node is either red or black 2. Every leaf (NULL pointer) is black • Note: this means every “real” node has 2 children 3. If a node is red, both children are black • Note: can’t have 2 consecutive reds on a path 4. Every path from node to descendent leaf contains the same number of black nodes 5. The root is always black
Red-Black Trees • Put example on board and verify properties: 1. Every node is either red or black 2. Every leaf (NULL pointer) is black 3. If a node is red, both children are black 4. Every path from node to descendent leaf contains the same number of black nodes 5. The root is always black • black-height: # black nodes on path to leaf • Label example with h and bh values
X A Red-Black Tree with NULLs shown Black-Height of the tree (the root) = 3Black-Height of node “X” = 2
A Red-Black Tree with Black-Height = 3
X Black Height of the tree? Black Height of X?
Height of Red-Black Trees • What is the minimum black-height of a node with height h? • A: a height-h node has black-height h/2 • Theorem: A red-black tree with n internal nodes has height h 2 lg(n + 1) • How do you suppose we’ll prove this?
RB Trees: Proving Height Bound • Prove: n-node RB tree has height h 2 lg(n+1) • Claim: A subtree rooted at a node x contains at least 2bh(x) - 1 internal nodes • Proof by induction on height h • Base step: x has height 0 (i.e., NULL leaf node) • What is bh(x)?
RB Trees: Proving Height Bound • Prove: n-node RB tree has height h 2 lg(n+1) • Claim: A subtree rooted at a node x contains at least 2bh(x) - 1 internal nodes • Proof by induction on height h • Base step: x has height 0 (i.e., NULL leaf node) • What is bh(x)? • A: 0 • So…subtree contains 2bh(x) - 1 = 20 - 1 = 0 internal nodes (TRUE)
RB Trees: Proving Height Bound • Inductive proof that subtree at node x contains at least 2bh(x) - 1 internal nodes • Inductive step: x has positive height and 2 children • Each child has black-height of bh(x) or bh(x)-1 (Why?) • The height of a child = (height of x)- 1 • So the subtrees rooted at each child contain at least 2bh(x) - 1 - 1 internal nodes • Thus subtree at x contains (2bh(x) - 1 - 1) + (2bh(x) - 1 - 1) + 1= 2•2bh(x)-1 - 1 = 2bh(x) - 1 nodes
RB Trees: Proving Height Bound • Thus at the root of the red-black tree: n 2bh(root) - 1 (Why?) n 2h/2 - 1 (Why?) lg(n+1) h/2 (Why?) h 2 lg(n + 1) (Why?) Thus h = O(lg n)
RB Trees: Worst-Case Time • So we’ve proved that a red-black tree has O(lg n) height • Corollary: These operations take O(lg n) time: • Minimum(), Maximum() • Successor(), Predecessor() • Search() • Insert() and Delete(): • Will also take O(lg n) time • But will need special care since they modify tree