80 likes | 169 Views
n elements. CHAPTER 7 HASHING. Search by Formula. There’s a theorem saying that “Any algorithm that sorts by comparisons only must have a worst case computing time of ( n log 2 n ).”. But we already have binary search in O( ln n ) time after sorting.
E N D
n elements CHAPTER 7 HASHING Search by Formula There’s a theorem saying that “Any algorithm that sorts by comparisons only must have a worst case computing time of ( n log2n ).” But we already have binary search in O( ln n ) time after sorting. And we already have algorithms for sorting in optimal time O( n ln n )... Interpolation Search : Find key from a sorted list f [ l ].key, f [ l+1 ].key, ... , f [ u ].key. What is hashing for? Wait a minute — who said that O( n ln n ) is the optimal time? Then we just do something else besides comparison. f[u].key key For searching f[l].key If ( f [ i ].key < key ) l = i ; l i ? u Else u = i ; 1/8
M[0] = after a date, event, etc. M[1] = seeing that (expressing reason) …… …… §1 General Idea Symbol Table ( == Dictionary) ::= { < name, attribute > } 〖Example〗 In Oxford English dictionary name = since attribute = a list of meanings This is the worst disaster in California since I was elected. California Governor Pat Brown, discussing a local flood 〖Example〗 In a symbol table for a compiler name = identifier (e.g. int) attribute = a list of lines that use the identifier, and some other fields 2/8
§1 General Idea Symbol Table ADT: Objects: A set of name-attribute pairs, where the names are unique Operations: SymTabCreate(TableSize) BooleanIsIn(symtab, name) Attribute Find(symtab, name) SymTab Insert(symtab, name, attr) SymTab Delete(symtab, name) 3/8
§1 General Idea … … [0] [1] [s1] … … … … … … … … ht [ 1 ] ht [ 0 ] ht [b1] b buckets … … s slots Hash Tables For each identifier x, we define a hash function f ( x ) = position of x in ht[ ] (i.e. the index of the bucket that contains x ) T ::= total number of distinct possible values for x n ::= total number of identifiers in ht[ ] identifier density ::= n / T loading density ::= n / (s b) 4/8
§1 General Idea Slot 0 Slot 1 0 1 2 3 4 5 6 …… 25 A collision occurs when we hash two nonidentical identifiers into the same bucket, i.e. f ( i1 ) = f ( i2 ) when i1 i2 . An overflow occurs when we hash a new identifier into a full bucket. Collision and overflow happen simultaneously if s = 1. 〖Example〗 Mapping n = 10 C library functions into a hash table ht[ ] with b = 26 buckets and s = 2. Loading density = ? 10 / 52 = 0.19 acos atan To map the letters a ~ z to 0 ~ 25, we may define f ( x ) = ? x [ 0 ] ‘a’ char ceil define acos define float exp char exp float floor atan ceil floor clock ctime Without overflow, Tsearch = Tinsert = Tdelete = O( 1 ) 5/8
§2 Hash Function Properties of f : f ( x ) must be easy to compute and minimizes the number of collisions. f ( x ) should be unbiased. That is, for any x and any i, we have that Probability( f ( x ) = i ) = 1 / b. Such kind of a hash function is called a uniform hash function. f ( x ) = x % TableSize ;/* if x is an integer */ What if TableSize = 10 and x’s are all end in zero? TableSize = prime number ---- good for random integer keys 6/8
§2 Hash Function f ( x ) = ( x[i]) % TableSize ;/* if x is a string */ 〖Example〗 TableSize = 10,007 and string length of x 8. If x[i] [0, 127], then f ( x ) [0, ? ] 1016 f ( x ) = (x[0]+ x[1]*27 + x[2]*272) % TableSize ; Total number of combinations = 263 = 17,576 Actual number of combinations < 3000 7/8
§2 Hash Function ………… x[0] x[1] x[N-2] x[N-1] f ( x ) = ( x[N i 1] * 32i ) % TableSize ; Can you make it faster? Faster than *27 Index Hash3( constchar *x, int TableSize ) { unsigned int HashVal = 0; /* 1*/while( *x != '\0' ) /* 2*/ HashVal = ( HashVal << 5 ) + *x++; /* 3*/return HashVal % TableSize; } Carefully select some characters from x. If x is too long (e.g. street address), the early characters will be left-shifted out of place. 8/8