390 likes | 500 Views
CSC 213 – Large Scale Programming. Lecture 11: Why I Like Hash. Today’s Goal. Consider what will be important when searching Why search in first place? What is its purpose? What should we expect & handle when searching? What factors matter to our users (and ourselves)?
E N D
CSC 213 – Large Scale Programming Lecture 11: Why I Like Hash
Today’s Goal • Consider what will be important when searching • Why search in first place? What is its purpose? • What should we expect & handle when searching? • What factors matter to our users (and ourselves)? • (Besides source of bad jokes) What is hashing? • Why important for searching? How can it help? • What are critical factors of good hash function? • Commonly-used hash function example examined
Keys To Map & Dictionary • Used to convert the keyinto value • valuescannot share a keyand be in same Map • In searching failure is normal, not exceptional
Entry ADT • Needs 2 pieces: what we have & what we want • First part is the key: data used in search • Item we want is value; the second part of an Entry • Implementations must define 2 methods • key()& value()return appropriate item • Usually includes setValue()but NOTsetKey()
Sequence-Based Map • Sequence’s perspective of Mapthat it holds Positions elements
Sequence-Based Map • Outside view of Map and how it is stored Positions Entrys
Sequence-Based Map • Mapimplementation’s view of data and storage Positions Elements/Entrys
Please hold while the machine searches 1,000,000 records for your location
Map Performance • In all seriousness, can be matter of life-or-death • 911 Operators immediatelyneed addresses • Google’s search performance in TB/s • O(log n) time too slow for these uses • Would love to use arrays • Get O(1) time to add, remove, or lookup data • This HUGE array needs massive RAM purchase
Monster Amounts of RAM • Java requires using int as array index • Limit to int and RAM available in a machine • Integer.MAX_VALUE = 2,147,483,647 • 8,200,000,000 pages in Google’s index (2005) • In US, possible phone numbers = 10,000,000,000 • Must do more for O(1) array usage time
Monster Amounts of RAM • Java requires using int as array index • Limit to int and RAM available in a machine • Integer.MAX_VALUE = 2,147,483,647 • 8,200,000,000 pages in Google’s index (2005) • In US, possible phone numbers = 10,000,000,000 • Must do more for O(1) array usage time • As with all life’s problems we turn to hash
Monster Amounts of RAM • Java requires using int as array index • Limit to int and RAM available in a machine • Integer.MAX_VALUE = 2,147,483,647 • 8,200,000,000 pages in Google’s index (2005) • In US, possible phone numbers = 10,000,000,000 • Must do more for O(1) array usage time • As with all life’s problems we turn to hash
Hashing To The Rescue • Hash function turns keyinto intfrom 0 – N-1 • Result is usable as index for an array • Specific for key’stype; cannot be reused • Store the Entrysin array (“hash table”) • (Great name for shop in Amsterdam, too) • Begin by computing key’s hash value • Result is array index for that Entry • Now is possible to use array for O(1) time!
Hash Table Example • Example shows table of Entry<Long,String> • Simple hash function ish(x) = xmod 10,000 • x is/from Entry’skey • h(x) computes index to use • Always is mod array length • Not all locations used • Holes willappear in array • Empties: set to null-or- use sentinel value
When We Use Hash • Hash key tofind index • First step for most calls • get()-need index to check • Add at that index -put() • remove()- index to set null • Then check keyat index • At index manykeyspossible • Still aMap, so results known • If you find keys not samecannot treat as the same!
Properties of Good Hash • To really be useful, hash must have properties Reliable Fast Use entire table
Properties of Good Hash • To really be useful, hash must have properties Reliable Fast Use entire table Make good brownies
Reliability of Hash Function • Implement Mapwith a hash table • To use Entry, get key toeasily look up its index • Always computes same indexfor that key
Speed of Hash Function • Hash must be computed on each access • Goal: O(1) efficiency by using an array • Efficiency of array wasted if hash is slow • If O(1) computation performed by hash function • It is possible to performgetin O(1) time • O(1) time for put& removecould also occur • None of this is guaranteed; many problems can occur
Use Entire Table Important • Hashing take lots of space because array is used • When creating, make array big enough to hold all data • Can copy to larger array, but this notO(1) operation • Use prime number lengths but these quickly get large • Spreads out Entrys equally across entire table • Further apart it's spread, easier to find opening
Hash Function Analogy Hash table
Hash Function Analogy Hash function Hash table
Examples of Bad Hash • h(x) = 0 • Reliable,fast, little use of table • h(x) = random.nextInt() • Unreliable,fast, uses entire table • h(x) = current index -or- free index • Reliable, slow,uses entire table • h(x) = x34 + 2x33+ 24x32 + 10x31… • Reliable,moderate,too large
Incredibly Bad Hash • Using only part of key& not whole thing • No matter what, inevitably, you will guess wrong
Incredibly Bad Hash • Using only part of key& not whole thing • No matter what, inevitably, you will guess wrong
Incredibly Bad Hash • Using only part of key& not whole thing • No matter what, inevitably, you will guess wrong Part used for hash
Incredibly Bad Hash • Using only part of key& not whole thing • No matter what, inevitably, you will guess wrong Part that matters Part used for hash
Censored Good Hash • Hash must first turnkeyinto int • Easy for numbers, but rarely that simple in real life • For a String, could add value of each character • Would hash to same index “spot”, “pots”, “stop” • Instead we usually use polynomial code:
Censored Good Hash • Hash must first turnkeyinto int • Easy for numbers, but rarely that simple in real life • For a String, could add value of each character • Would hash to same index “spot”, “pots”, “stop” • Instead we usually use polynomial code:
Censored Good Hash • Hash must first turnkeyinto int • Easy for numbers, but rarely that simple in real life • For a String, could add value of each character • Would hash to same index “spot”, “pots”, “stop” • Instead we usually use polynomial code:
Good, Fast Hash • Polynomial codes good, but veryslow • Major bummer since we use hash for its speed • Cause of slowdown: computing antakes n operations • Horner’s method better by piggybacking work
Compression • Hash’s only use is computing array indices • Useless if larger than table’s length: no index exists! • When a=33, “spot” hashed to 4,293,383 • Some hash incalculable (like “triskaidekaphobia”) • To compress result, work like array-based queue hash=(result+length)%length • % returns by modulus (the remainder from division) • Serves exact same purpose: keeps index within limits
Before Next Lecture… • Continue working on week #4 assignment • Due at usual time Tues. so may want to get cracking • Start thinking of designs & CRC cards for project • Due in 10 days as projects completed in stages • Read sections 9.2.1 & 9.2.5 – 9.2.7 of the book • Consider better ways of handling this situation: