400 likes | 600 Views
Data Structures and Algorithms for Information Processing. Lecture 10: Searching II. Outline. One more O/A scheme – Ordered Hashing (Tough Schoolboy problem) Analysis of hashing algorithms Some practical considerations Radix searching. Open vs. Chained Hashing. How big should the table be?
E N D
Data Structures and Algorithms for Information Processing Lecture 10: Searching II Lecture 10: Searching
Outline • One more O/A scheme – Ordered Hashing (Tough Schoolboy problem) • Analysis of hashing algorithms • Some practical considerations • Radix searching Lecture 10: Searching
Open vs. Chained Hashing • How big should the table be? • Open addressing can be inconvenient when the number of insertions and deletions is unpredictable - overflow. • Simple solution to overflow: Resize (double) table, rehashing everything into the new table • Use Knuth’s approach and double hashing to avoid clustering. Lecture 10: Searching
Variant: Ordered Hashing • In linear probing, we stop search when we find an empty cell or a record with a key equal to the search key • In ordered hashingwe stop when we find a key less than or equal to the search key (tough schoolboy hashing) Lecture 10: Searching
Tough Schoolboy hashing • 13 chairs in the classroom • Each boy has a preferred seat • Each boy has a jump value • Boys later in the alphabet are bigger Lecture 10: Searching
Class in the morning • Inserts Don prefers 3 jumps 2 Bill prefers 5 jumps 4 Al prefers 3 jumps 6 Joe prefers 3 jumps 4 Lecture 10: Searching
Searching the classroom • Search for Don, Bill, Al, and Joe • Search for Ken who prefers 3 and jumps 1 Lecture 10: Searching
Variant: Ordered Hashing • This reduces the time of unsuccessful search to about the same as successful search • Useful for applications where we expect to have a large number of unsuccessful searches Lecture 10: Searching
Summary of Basic Searching • Hashing is preferred to binary tree methods in general, since it is faster. • But binary search trees are truly dynamic (no advance info on size needed). • BSTs also give worst case guarantees (hash function could be lousy). • BSTs support more operations — sorting. Lecture 10: Searching
Time Analysis • Open address hashing methods store N records in a table of size M. M > N • The performance of the operations depends on the load factor alpha = N/M • For chained hashing, alpha may be greater than 1. Lecture 10: Searching
Linear Probing • Open address hashing with linear probing requires, on average: 1/2 ( 1 + 1/(1-alpha)^2) operations for an unsuccessful search 1/2 ( 1 + 1/(1-alpha)) operations for a successful search • E.g., for alpha = 2/3 we’ll make 5 probes for an average unsuccessful search, and 2 for a successful search Lecture 10: Searching
Double Hashing • Open address hashing with double hashing requires, on average: 1/(1-alpha) operations for an unsuccessful search -log(1-alpha)/alpha operations for a successful search • E.g., for alpha = 2/3 we’ll make 3 probes for an average unsuccessful search, and 1.65 for a successful search Lecture 10: Searching
Chained Hashing • Chained hashing requires, on average: 1+alpha operations for an unsuccessful search 1+alpha/2 operations for a successful search • E.g., for alpha = 2/3 we’ll make 1.66 probes for an average unsuccessful search, and 1.33 for a successful search Lecture 10: Searching
Time Analysis • These formulas require significant mathematical analysis, which we won’t go into. Lecture 10: Searching
Average Number of Probes Successful Search Lecture 10: Searching
Radix Searching • For many applications, keys can be thought of as numbers • Searching methods that take advantage of digital properties of these keys are called radix searches • Radix searches treat keys as numbers in base M (the radix) and work with individual digits Lecture 10: Searching
Radix Searching • Provide reasonable worst-case performance without complication of balanced trees. • Provide way to handle variable length keys. • Biased data can lead to degenerate data structures with bad performance. Lecture 10: Searching
The Simplest Radix Search • Digital Search Trees — like BSTs but branch according to the key’s bits. • Key comparison replaced by function that accesses the key’s next bit. Lecture 10: Searching
A E S C H R Digital Search Example A 00001 S 10011 E 00101 R 10010 C 00011 H 01000 Lecture 10: Searching
Digital Search • Requires O(log N) comparisons on average • Requires b comparisons in the worst case for a tree built with N random b-bit keys Lecture 10: Searching
Digital Search • Problem: At each node we make a full key comparison — this may be expensive, e.g. very long keys • Solution: store keys only at the leaves, use radix expansion to do intermediate key comparisons Lecture 10: Searching
Radix Tries • Used for Retrieval [sic] • Internal nodes used for branching, external nodes used for final key comparison, and to store data Lecture 10: Searching
Radix Trie Example A 00001 S 10011 E 00101 R 10010 C 00011 H 01000 H E A C S R Lecture 10: Searching
Radix Tries • Left subtree has all keys which have 0 for the leading bit, right subtree has all keys which have 1 for the leading bit • An insert or search requires O(log N) bit comparisons in the average case, and b bit comparisons in the worst case Lecture 10: Searching
Radix Tries • Problem: lots of extra nodes for keys that differ only in low order bits (See R and S nodes in example above) • This is addressed by Patricia trees, which allow “lookahead” to the next relevant bit • Practical Algorithm To Retrieve Information Coded In Alphanumeric (Patricia) • In the slides that follow the entire alphabet would be included in the indexes Lecture 10: Searching
// Insert word K (see Drozdek and Simon – needs work) i=0; p=root; While not inserted if (K[i] == ‘\0’) set end-of-word marker in p to true else if (p.ptrs[K[i]] == null) create leaf containing K and put its address in p.ptrs[K[i]] else if (refernce p.ptrs[k[i]] refers to a leaf) K_L = key in leaf p.ptrs[K[i]]; do create a non-leaf and put its address in p.ptrs[K[i]] p = the new non-leaf; i++; while (K[i] == K_L[i]); create a leaf containing K and put its address in p.ptrs[K[--i]] if (end-of-word K reached) set end-of-word marker in p to true else create leaf containing K_L and put address in p.ptrs[K_L[i]] else p = p.ptrs[K[i++]] Lecture 10: Searching
Empty Radix Trie Insert “ARA” # A E I P R ARA Lecture 10: Searching
# A E I P R ARA # A E I P R P # A E I P R P # A E I P R K_L K ARA AREA Insert “AREA” Lecture 10: Searching
# A E I P R Insert “A” P P # A E I P R A # A E I P R ARA AREA Lecture 10: Searching
# A E I P R # A E I P R # A E I P R # A E I P R # A E I P R # A E I P R PIER EIRE IPA IRE EERIE A # A E I P R # A E I P R ARA # A E I P R ERA ERIE ERE PEER ARE PEAR PER AREA Lecture 10: Searching
A L Radix Trie O ADAM G G E I A N D R LOGGIA LOGGING LOGGED LOGGERHEAD Lecture 10: Searching
A L E N A 5 0 0 Patricia Tree 0 4 ADAM I 5 D R 0 0 LOGGIA LOGGING LOGGERHEAD LOGGED Lecture 10: Searching