240 likes | 264 Views
Hashing Algorithm. 9042635 羅正鴻 9142610 林彥廷 9142621 戴嘉宏. Introduction. Hashing , a ubiquitous information retrieval strategy for providing efficient access to information based on a key Information can usually be accessed in constant time Hashing ’ s drawbacks. Concept of hashing.
E N D
Hashing Algorithm 9042635 羅正鴻 9142610 林彥廷 9142621 戴嘉宏
Introduction • Hashing , a ubiquitous information retrieval strategy for providing efficient access to information based on a key • Information can usually be accessed in constant time • Hashing’s drawbacks
Concept of hashing • The problem at hand is to define and implement a mapping from a domain of keys to a domain of locations • From the performance standpoint, the goal is to avoid collisions (A collision occurs when two or more keys map to the same location) • From the compactness standpoint, no application ever stores all keys in a domain simultaneously unless the size of the domain is small
Concept of hashing (con’t) • The information to be retrieved is stored in a hash table which is best thought of as an array of m locations, called buckets • The mapping between a key and a bucket is called the hash function • The time to store and retrieve data is proportional to the time to compute the hash function
Hashing function • The ideal function, termed a perfect hash function, would distribute all elements across the buckets such that no collisions ever occurred • h(v) = f(v) mod m • Knuth(1973) suggests using as the value for m a prime number
Hashing function(con’t) • It is usually better to treat v as a sequence of bytes and do one of the following for f(v): (1) Sum or multiply all the bytes. Overflow can be ignored (2) Use the last (or middle) byte instead of the first (3) Use the square of a few of the middle bytes
Implementing hashing • The following operations are usually provided by an implementation of hashing: (1) Initialization (2) Insertion (3) Retrieval (4) Deletion
Chained hashing(con’t) • In the worst case (where all n keys map to a single location), the average time to locate an element will be proportional to n/2. • In the best case (where all chains are of equal length), the time will be proportional to n/m.
Minimal perfect hash functions • Minimal perfect hash function (MPHF) is a perfect hash function with the property that is hashed m keys to m buckets with no collisions • Cichelli(1980) and of Cercone et al.(1983) proposed two important concepts: (1)using tables of values as the parameters (2)using a mapping, ordering, and searching (MOS) approach
Minimal perfect hash functions(con’t) • Mapping:transform the key set from an original to a new universe • Ordering:place the keys in a sequence that determines the order in which hash values are assigned to keys • Searching:assign hash values to the keys of each level Mapping → Ordering → Searching
Sager’s method and improvement • Sager(1984,1985) formalizes and extends Cichelli’s approach • In the mapping step, three auxiliary(hash) functions are defined on the original universe of keys U: h0:U→{ 0 , …… , m - 1 } h1:U→{ 0 , …… , r - 1 } h2:U→{ r , …… , 2r –1 }
Sager’s method and improvement • The class of functions searched is h(k) = ( h0(k) + g(h1(k)) + g(h2(k)) (mod m) • Sager uses a graph that represents the constraints among keys • The mapping step goes from keys to triples to a special bipartite graph, the dependency graph, whose vertices are the h1(k) and h2(k) values and whose edges represent the words
The algorithm • The mapping step
The algorithm (con’t) • The ordering step
The algorithm (con’t) • The searching step
Discussion • Hashing algorithm is a constant-time algorithm, and there are always advantages to being able to predict the time needed to locate a key • The MPHF uses a large amount of space