430 likes | 663 Views
2IL50 Data Structures. Spring 2014 Lecture 7: Binary Search Trees. Dynamic Sets. Dynamic sets. Dynamic sets Sets that can grow, shrink, or otherwise change over time. Two types of operations: queries return information about the set modifying operations change the set Common queries
E N D
2IL50 Data Structures Spring 2014Lecture 7: Binary Search Trees
Dynamic sets Dynamic setsSets that can grow, shrink, or otherwise change over time. • Two types of operations: • queries return information about the set • modifying operations change the set Common queries Search, Minimum, Maximum, Successor, Predecessor Common modifying operations Insert, Delete
Dictionary DictionaryStores a set S of elements, each with an associated key (integer value). OperationsSearch(S, k): return a pointer to an element x in S with key[x] = k, or NIL if such an element does not exist. Insert(S, x): inserts element x into S, that is, S← S⋃ {x} Delete(S, x): remove element x from S
Implementing a dictionary Hash tableRunning times are average times and assume (simple) uniform hashing and a large enough table (for example, of size 2n) Today Binary search trees Θ(n) Θ(1) Θ(1) Θ(log n) Θ(n) Θ(n) Θ(1) Θ(1) Θ(1)
Binary search trees • Binary search trees are an important data structure for dynamic sets. • They can be used as both a dictionary and a priority queue. • They accomplish many dynamic-set operations in O(h) time, where h = height of tree.
= NIL Binary search trees • root of T denoted by root[T] • internal nodes have four fields: key (and possible other satellite data) left: points to left child right: points to right child p: points to parent. p[root[T]] = NIL root[T] Binary tree T 5 3 7 4 8 1
= NIL Binary search trees Keys are stored only in internal nodes! There are binary search trees which store keys only in the leaves … root[T] Binary tree T 5 3 7 4 8 1
5 3 7 4 8 1 Binary search trees • a binary tree is • a leaf or • a root node x with a binary tree as its left and/or right child Binary-search-tree property • if y is in the left subtree of x, then key[y] ≤ key[x] • if y is in the right subtree of x, then key[y] ≥ key[x] Keys don’t have to be unique … can use stricter property, take care when balancing
5 1 3 7 3 4 8 1 7 5 8 4 Binary search trees Binary-search-tree property • if y is in the left subtree of x, then key[y] ≤ key[x] • if y is in the right subtree of x, then key[y] ≥ key[x] height h = 2 height h = 4
5 3 7 4 8 1 Inorder tree walk • binary search trees are recursive structures ➨ recursive algorithms often work well Example: print all keys in order using an inorder tree walk InorderTreeWalk(x) • if x ≠ NIL • then InorderTreeWalk(left[x]) • print key[x] • InorderTreeWalk(right[x]) Correctness: follows by induction from the binary search tree property Running time? Intuitively, O(n) time for a tree with n nodes, since we visit and print each node once.
Inorder tree walk InorderTreeWalk(x) • if x ≠ NIL • then InorderTreeWalk(left[x]) • print key[x] • InorderTreeWalk(right[x]) TheoremIf x is the root of an n-node subtree, then the call InorderTreeWalk(x) takes Θ(n) time. Proof: • T(n) takes small, constant amount of time on empty subtree ➨ T(0) = c for some positive constant c • for n > 0 assume that left subtree has k nodes, right subtree n-k-1 ➨ T(n) = T(k) + T(n-k-1) + d for some positive constant d use substitution method …to show: T(n) = (c+d)n + c ■
Binary-search-tree property 5 k 3 7 4 8 1 ≤ k ≥ k Querying a binary search tree TreeSearch(x, k) • if x = NIL or k = key[x] • then return x • if k < key[x] • then returnTreeSearch(left[x], k) • else returnTreeSearch(right[x], k) Initial call: TreeSearch(root[T], k) • TreeSearch(root[T], 4) • TreeSearch (root[T], 2) Running time: Θ(length of search path) worst caseΘ(h)
Binary-search-tree property k ≤ k ≥ k Querying a binary search tree – iteratively TreeSearch(x, k) • if x = NIL or k = key[x] • then return x • if k < key[x] • then returnTreeSearch(left[x], k) • else returnTreeSearch(right[x], k) IterativeTreeSearch(x, k) • while x ≠ NIL and k ≠ key[x] • do if k < key[x] • then x = left[x] • else x = right[x] • return x • iterative version more efficient on most computers
Binary-search-tree property 5 k 3 7 4 8 1 ≤ k ≥ k Minimum and maximum • binary-search-tree property guarantees that • the minimum key is located in the leftmost node • the maximum key is located in the rightmost node TreeMinimum(x) • while left[x] ≠ NIL • do x = left[x] • return x TreeMaximum(x) • while right[x] ≠ NIL • do x = right[x] • return x
Binary-search-tree property k ≤ k ≥ k Minimum and maximum • binary-search-tree property guarantees that • the minimum key is located in the leftmost node • the maximum key is located in the rightmost node TreeMinimum(x) • while left[x] ≠ NIL • do x = left[x] • return x TreeMaximum(x) • while right[x] ≠ NIL • do x = right[x] • return x Running time? Both procedures visit nodes on a downward path from the root➨ O(h) time
Successor and predecessor • Assume that all keys are distinct Successor of a node xnode y such that key[y] is the smallest key > key[x] We can find y based entirely on the tree structure, no key comparisons are necessary … if x has the largest key, then we say x’s successor is NIL
Successor and predecessor Successor of a node xnode y such that key[y] is the smallest key > key[x] Two cases: • x has a non-empty right subtree➨ x’s successor is the minimum in x’s right subtree • x has an empty right subtree➨ x’s successor y is the node that x is the predecessor of (x is the maximum in y’s left subtree) as long as we move to the left up the tree (move up through right children), we’re visiting smaller keys …
15 6 18 7 17 20 3 2 4 13 9 Successor and predecessor TreeSucessor(x) • if right[x] ≠ NIL • then returnTreeMinimum(right[x]) • y = p[x] • while y ≠ NIL and x = right[y] • do x = y • y = p[x] • return y • TreePredecessor is symmetric • Successor of 15? • Successor of 6? • Successor of 4? • Predecessor of 6?
15 6 18 7 17 20 3 2 4 13 9 Successor and predecessor TreeSucessor(x) • if right[x] ≠ NIL • then returnTreeMinimum(right[x]) • y = p[x] • while y ≠ NIL and x = right[y] • do x = y • y = p[x] • return y • TreePredecessor is symmetric Running time? O(h)
TreeInsert(T, z) y = NIL x = root[T] while x ≠ NIL do y = x if key[z] < key[x] then x = left[x] else x = right[x] p[z] = y if y == NIL then root[T] = z else if key[z] < key[y] then left[y] = z else right[y] = z to insert value v, insert node z with key[z] = v, left[z] = NIL, and right[z] = NIL traverse tree down to find correct position for z Insertion
TreeInsert(T, z) y = NIL x = root[T] while x ≠ NIL do y = x if key[z] < key[x] then x = left[x] else x = right[x] p[z] = y if y == NIL then root[T] = z else if key[z] < key[y] then left[y] = z else right[y] = z to insert value v, insert node z with key[z] = v, left[z] = NIL, and right[z] = NIL traverse tree down to find correct position for z 14 Insertion y z = NIL x y 15 x y 6 18 x y 7 17 20 3 x y 2 4 13 x = NIL 9
TreeInsert(T, z) y = NIL x = root[T] while x ≠ NIL do y = x if key[z] < key[x] then x = left[x] else x = right[x] p[z] = y if y == NIL then root[T] = z else if key[z] < key[y] then left[y] = z else right[y] = z Running time? to insert value v, insert node z with key[z] = v, left[z] = NIL, and right[z] = NIL traverse tree down to find correct position for z 14 Insertion 15 6 18 7 17 20 3 y 2 4 13 O(h) z 9
Deletion • we want to delete node z TreeDelete has three cases: • z has no children ➨ delete z by having z’s parent point to NIL, instead of to z 15 6 18 9 20 3 7
Deletion • we want to delete node z TreeDelete has three cases: • z has no children ➨ delete z by having z’s parent point to NIL, instead of to z • z has one child ➨ delete z by having z’s parent point to z’s child, instead of to z 15 6 18 9 20 3 7
Deletion • we want to delete node z TreeDelete has three cases: • z has no children ➨ delete z by having z’s parent point to NIL, instead of to z • z has one child ➨ delete z by having z’s parent point to z’s child, instead of to z • z has two children ➨ z’s successor y has either no or one child ➨ delete y from the tree and replace z’s key and satellite data with y’s 15 6 18 9 20 3 7
Deletion • we want to delete node z TreeDelete has three cases: • z has no children ➨ delete z by having z’s parent point to NIL, instead of to z • z has one child ➨ delete z by having z’s parent point to z’s child, instead of to z • z has two children ➨ z’s successor y has either no or one child ➨ delete y from the tree and replace z’s key and satellite data with y’s Running time? 15 7 18 9 20 3 O(h)
Minimizing the running time • all operations can be executed in time proportional to the height h of the tree (instead of proportional to the number n of nodes in the tree) Worst case: Solution: guarantee small height (balance the tree) ➨ h = Θ(log n) Θ(n)
Balanced Search trees • There are many methods to balance a search tree. by weightfor each node the number of nodes in the left and the right subtree is approximately equal by heightfor each node the height of the left and the right subtree is approximately equal by degreeall leaves have the same height, but the degree of the nodes differs (hence not a binary search tree)
number of leaves in subtree rooted at y α≤ ≤ 1-α number of leaves in subtree rooted at x x y Weight-balanced search trees BB[α]-treebinary search tree where for each pair of nodes x, y, with y being a child of x we have where α is a positive constant with α≤ 1/3 • For the height h(n) holds h(n) ≤ h((1-α)n) + 1 Master theorem ➨ h(n) = Θ(log n) • Ideally: α as close as possible to 1/3 But: α = 1/3 gives too little flexibility for updates α just smaller than 1/3 works fine
h-1, h, or h+1 h Height-balanced search trees AVL-treebinary search tree where for each node | height left subtree – height right subtree | ≤ 1 TheoremAn AVL-tree with n nodes has height Θ(log n).
Height-balanced search trees TheoremAn AVL-tree with n nodes has height Θ(log n). Proof Let n(h) = minimum number of nodes in an AVL-tree of height h Claim: n(h) ≥ 2h/2 – 1 Proof of Claim: induction on h • h = 1 • h = 2 n(1) = 1 ✔ n(2) = 2 ✔
h-2 h-1 Height-balanced search trees TheoremAn AVL-tree with n nodes has height Θ(log n). Proof Let n(h) = minimum number of nodes in an AVL-tree of height h Claim: n(h) ≥ 2h/2 – 1 Proof of Claim: induction on h • h > 2 n(h) = 1 + n(h-1) + n(h-2) ≥ 2n(h-2) ≥ 2 ∙ 2(h-2)/2 - 1 ≥ 2h/2 – 1 ■
7,14 4 10 17,40 5 12 2 9 15 23 50 Degree-balanced trees (2, 3)-treesearch tree where all leaves have the same height and internal nodes have degree 2 or 3 TheoremA (2, 3)-tree with n nodes has height Θ(log n).
Red-black Trees Another height-balanced search tree …
10 18 2 12 50 17 NIL NIL NIL NIL NIL NIL NIL Red-black trees Red-black treebinary search tree where each node has a color attribute which is either red or black Red-black properties • Every node is either red or black. • The root is black • Every leaf (nil[T]) is black. • If a node is red, then both its children are black.(Hence no two reds in a row on a simple path from the root to a leaf) • For each node, all paths from the node to descendant leaves contain the same number of black nodes.
10 18 2 12 50 17 NIL NIL NIL NIL NIL NIL NIL Red-black trees: height height of a nodenumber of edges on a longest path to a leaf black-height of a node xbh(x) is the number of black nodes (including nil[T]) on the path from x to leaf, not counting x. Red-black properties • Every node is either red or black. • The root is black • Every leaf (nil[T]) is black. • If a node is red, then both its children are black.(Hence no two reds in a row on a simple path from the root to a leaf) • For each node, all paths from the node to descendant leaves contain the same number of black nodes. h = 4bh = 2 h = 3bh = 2 NIL h = 1bh = 1
12 50 10 10 17 18 18 2 2 NIL 12 50 17 NIL NIL NIL NIL NIL NIL NIL Red-black trees: implementation detail • It is useful to replace each NIL by a single sentinel nil[T] which is always black. The root’s parent is also the sentinel.
10 10 18 18 2 2 Red-black trees: implementation detail • It is useful to replace each NIL by a single sentinel nil[T] which is always black. The root’s parent is also the sentinel. • NIL will not (always) be drawn on the following slides 12 50 12 50 17 17
x “smallest” subtree with black-height bh(x) bh(x) Red-black trees LemmaA red-black tree with n nodes has height ≤ 2 log(n+1). Proof • the subtree rooted at any node x containsat least2bh(x) - 1 internal nodes • the complete tree has n internal nodes➨2bh(root[T]) - 1 ≤ n ➨bh(root[T]) ≤ log (n+1) ➨ height(T) = number of edges on a longest path to a leaf ≤ 2 ∙ bh(root[T]) ≤ 2 log(n+1)■
Balanced binary search trees Advantages of balanced binary search trees over linked lists efficient search in Θ(log n) time oversorted arrays efficient insertion and deletion in Θ(log n) time overhash tables can find successor and predecessor efficiently in Θ(log n) time