210 likes | 218 Views
This article discusses static dictionaries and search trees, including hashing algorithms and binary search tree structures. It explores the concepts of perfect and minimal perfect hashing and compares different search types and operations.
E N D
Static Dictionaries • Collection of items. • Each item is a pair. • (key, element) • Pairs have different keys. • Operations are: • initialize/create • get (search)
Hashing • Perfect hashing (no collisions). • Minimal perfect hashing (space = n). • CHD (compress, hash, and displace) algorithm. • O(n) time to construct the perfect or minimal perfect hash function. • O(1) search time. • Bothelo, Belazzougui & Dietzfelbinger. Compress, hash, and displace. 17th European Symposium on Algorithms, 2009.
Search Tree • Hashing not efficient for extended operations such as range search and nearest match. • Will examine a binary search tree structure for static dictionaries. • Each item/key/element has an estimated access frequency (or probability).
a b b c a c Example a < b < c Cost = 0.8 * 1 + 0.1 * 2 + 0.1 * 3 = 1.3 Cost = 0.8 * 2 + 0.1 * 1 + 0.1 * 2 = 1.9
f0 f1 f2 f3 b c a Search Types • Successful. • Search for a key that is in the dictionary. • Terminates at an internal node. • Unsuccessful. • Search for a key that is not in the dictionary. • Terminates at an external/failure node.
f0 f1 f2 f3 b c a Internal And External Nodes • A binary tree with n internal nodes has n + 1 external nodes. • Let s1, s2, …, sn be the internal nodes, in inorder. • key(s1) < key(s2) < … < key(sn). • Let key(s0) = –infinity and key(sn+1) = infinity. • Let f0, f1, …, fn be the external nodes, in inorder. • fi is reached iff key(si)< search key <key(si+1).
f0 f1 f2 f3 b c a Cost Of Binary Search Tree • Let pi = probability for key(si). • Let qi = probability for key(si)< search key <key(si+1). • Sum of ps and qs = 1. • Cost of tree = S0 <= i <= n qi (level(fi) – 1) + S1<= i <= n pi *level(si) • Cost = weighted path length.
Brute Force Algorithm • Generate all binary search trees with n internal nodes. • Compute the weighted path length of each. • Determine tree with minimum weighted path length. • Number of trees to examine is O(4n/n1.5). • Brute force approach is impractical for large n.
a5 a6 a3 a7 a4 f5 f2 f7 f6 f4 f3 Dynamic Programming • Keys are a1< a2 < …<an. • Let Ti j= least cost tree for ai+1, ai+2, …, aj. • T0n= least cost tree for a1, a2, …, an. T2,7 • Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj.
a5 a6 a3 a7 a4 f5 f2 f7 f6 f4 f3 Terminology T2,7 • Ti j= least cost tree for ai+1, ai+2, …, aj. • ci j= cost of Ti j =Si <= u <= j qu (level(fu) – 1) + Si < u <= j pu *level(su). • ri j= root of Ti j. • wi j= weight of Ti j = sum of ps and qs in Ti j = pi+1+ pi+2+ …+ pj + qi + qi+1 + … + qj
Tii fi i = j • Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj. • Ti i includes qi only. • ci i = cost of Ti i = 0. • ri i = root of Ti i = 0. • wi i = weight of Ti i = sum of ps and qs in Ti i = qi
a5 ak a6 a3 Ti j a7 L R a4 f5 f2 f7 f6 f4 f3 i < j • Ti j= least cost tree for ai+1, ai+2, …, aj. • Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj. • Let ak, i < k <= j, be in the root of Ti j. • L includes pi+1, pi+2, …, pk-1 and qi, qi+1, …, qk-1. • R includes pk+1, pk+2, …, pj and qk, qk+1, …, qj.
a5 a6 a3 a7 L a4 f5 f2 f7 f6 f4 f3 cost(L) • L includes pi+1, pi+2, …, pk-1 and qi, qi+1, …, qk-1. • cost(L) = weighted path length ofL when viewed as a stand alone binary search tree.
ak Ti j L L R Contribution To cij • ci j =Si <= u <= j qu (level(fu) – 1) + Si < u <= j pu *level(su). • When L is viewed as a subtree of Ti j , the level of each node is 1 more than when L is viewed as a stand alone tree. • So, contribution of L to cij is cost(L) + wi k-1.
a5 ak a6 a3 Ti j a7 L R a4 f5 f2 f7 f6 f4 f3 cij • Contribution of L to cij is cost(L) + wi k-1. • Contribution of R to cij is cost(R) + wkj. • cij = cost(L) + wi k-1 +cost(R) + wkj + pk = cost(L) + cost(R) + wij
a5 ak a6 a3 Ti j a7 L R a4 f5 f2 f7 f6 f4 f3 cij • cij = cost(L) + cost(R) + wij • cost(L) = cik-1 • cost(R) = ckj • cij = cik-1 + ckj + wij • Don’tknow k. • cij = mini < k <= j{cik-1 + ckj}+ wij
ak Ti j L R cij • cij = mini < k <= j{cik-1 + ckj}+ wij • rij = k that minimizes right side.
Computation Of c0n Andr0n • Start with ci i =0, ri i =0, wi i =qi, 0 <= i <= n (zero-key trees). • Use cij = mini < k <= j{cik-1 + ckj}+ wij to compute cii+1, ri i+1, 0 <= i <= n – 1 (one-key trees). • Now use the equation to compute cii+2, ri i+2, 0 <= i <= n – 2 (two-key trees). • Now use the equation to compute cii+3, ri i+3, 0 <= i <= n – 3 (three-key trees). • Continue until c0n andr0n(n-key tree) have been computed.
1 2 3 4 0 0 1 2 3 4 cij, rij, i <= j Computation Of c0n Andr0n
Complexity • cij = mini < k <= j{cik-1 + ckj}+ wij • O(n) time to compute one cij. • O(n2)cijs to compute. • Total time is O(n3). • May be reduced to O(n2) by using cij = min ri,j-1 < k <= ri+1,j {cik-1 + ckj}+ wij
T0n a10 T09 T10,n Construct T0n • Root is r0n. • Suppose that r0n = 10. • Construct T09 and T10,n recursively. • Time is O(n).