250 likes | 302 Views
Dynamic Hashing. Good for database that grows and shrinks in size Allows the hash function to be modified dynamically When the hash function takes modulo 10 in the previous example, the number of buckets is fixed to 10
E N D
Dynamic Hashing • Good for database that grows and shrinks in size • Allows the hash function to be modified dynamically • When the hash function takes modulo 10 in the previous example, the number of buckets is fixed to 10 • The trick is how to change the hash function so that the number of buckets can change • And at the same time, without the need of rehashing the existing records! • Imagine if you change modulo 10 to modulo 13, then every existing record has to be rehashed – not a good idea Department of Computer Science and Engineering, HKUST Slide 1
b-1 0 i bits Extendable Hashing • Extendable hashing - one form of dynamic hashing • Hashing function generates values over a large range - typically b-bit integers, with b = 32. • At any time, use only a prefix of the b-bit integers to index into a table of bucket addresses. Let the length of the prefix be i bits, 0 < i < 32 • Initially i = 1, meaning that it can index at most 2 buckets • When the 2 buckets are full, we can use 2 bits (i = 2), meaning that we can now index at most 4 buckets, and so on and so forth…. • i grows and shrinks as the size of the database grows and shrinks. • Actual number of buckets is < 2i, which may change due to bucket merging and splitting Department of Computer Science and Engineering, HKUST Slide 2
Extendable Hash Structure General Ideas New record • Initially, i = 1, use 1 bit in the hash key, resulting in two entries in the hash address table • Suppose we start with only 1 or 2 records, we need only 1 bucket initially • Both entries in the hash address table point to the same bucket • i0 = 0 means no bit had been used to separate records in the bucket (I.e., records are all hashed into bucket 0 irrespective of the any bit setting in the hash key values) Department of Computer Science and Engineering, HKUST Slide 3
New record Extendable Hash Structure Bucket Expansion • Suppose bucket 0 is full and a new record arrives • Create a new bucket, rehash the three records (two existing ones and the new record) into buckets 0 and, according to the last bit of their hash keys Department of Computer Science and Engineering, HKUST Slide 4
Record originally in bucket 0 New record General Extendable Hash Structure • Note: why do we need to keep i, i0 and i1? • i is the maximum number of bits used in hashing so far; i0 and i1 are the number of bits used for these particular buckets Department of Computer Science and Engineering, HKUST Slide 5
General Extendable Hash Structure A new record 1) Upon inserting of a new (red) record, bucket 0 is full again 2) Bucket 2 is created, and the three records (two existing ones and the new one) are rehashed among buckets 0 and 2 based on the second bit Department of Computer Science and Engineering, HKUST Slide 6
2 bits from the hash key had been use to hash the records use the 3rd bit in next split use the first 2 bits from the hash key to address the 4 entries in the table. Bucket 1 not changed Bucket 2 is new 1 bit from the hash key had been use to hash the records use the 2nd bit in next split General Extendable Hash structure Department of Computer Science and Engineering, HKUST Slide 7
Extendable Hash Structure – Properties • Every expansion doubles the number of entries in the table • Multiple entries in the bucket address table may point to the same bucket. It means that the bucket hasn’t been expanded while other buckets had been expanded multiple times • Each bucket j stores a value ij; entries in the same bucket have the same values on the first ij bits of the hash keys • To locate the bucket containing search-key Kj:1. Compute h(Kj) = X2. Use the first i high order bits of X to look up the hash address table, and follow the pointer to appropriate bucket • To insert a record with search-key value Kj, look up the bucket where it should belong, say j. If there is room in bucket j insert record in the bucket, else the bucket must be split and insertion re-attempted. Department of Computer Science and Engineering, HKUST Slide 8
could be any number > 1 2 Split in Extendable hash Structure To split a bucket j when inserting record with search-key value Kj; • If i > ij (more than one pointer to bucket j) • allocate a new bucket z, and set ij and iz to the old ij+1. • make the second half of the bucket address table entries pointing to j to point to z • remove and reinsert each record in bucket j. • recompute new bucket for Kj and insert record in the bucket (further splitting is required if the bucket is still full) 2 1 2 Department of Computer Science and Engineering, HKUST Slide 9
i0=2 i1=2 new record i2=2 i3=2 Split in Extendable hash Structure To split a bucket j when inserting record with search-key value Kj; • If i = ij (only one pointer to bucket j) • incrementi and double the size of the bucket address table. • Replace each entry in the table by two entries that point to the same bucket. • Re-compute new bucket address table entry for Kj, now i > ij, so use the first case above. Department of Computer Science and Engineering, HKUST Slide 10
Example: Use of Extendable Hash Structure Branch-nameh(branch-name) Brighton 0010 1101 1111 1011 0010 1100 0011 0000 Downtown 1010 0011 1010 0000 1100 0110 1001 1111 Mianus 1100 0111 1110 1101 1011 1111 0011 1010 Perryridge 1111 0001 0010 0100 1001 0011 0110 1101 Redwood 0011 0101 1010 0110 1100 1001 1110 1011 Round hill 1101 1000 0011 1111 1001 1100 0000 0001 Initial Hash structure, Bucket size=2 Bucket 0 0 hash address table Department of Computer Science and Engineering, HKUST Slide 11
0 Insert: Brighton, A-217, 750 0010 1101 1111 1011 0010 1100 0011 0000 no bit is needed from the hash value (i=0) Example Brighton, A-217, 750 Department of Computer Science and Engineering, HKUST Slide 12
0 Insert: Brighton, A-217, 750 Downtown, A-101, 500 1010 0011 1010 0000 1100 0110 1001 1111 no bit is needed from the hash value (i=0) Insert: bucket full, split records according to first bit (i=1) Downtown, A-101, 600 Example Downtown, A-101, 500 Department of Computer Science and Engineering, HKUST Slide 13
Brighton 0010 1101 1111 1011 0010 1100 0011 0000 Downtown 1010 0011 1010 0000 1100 0110 1001 1111 Mianus, A-215, 700 Insert: 1100 0111 1110 1101 1011 1111 0011 1010 Hash into bucket 1, which is full Example 1 Brighton, A-217, 750 1 Downtown, A-101, 500 Downtown, A-101, 600 Department of Computer Science and Engineering, HKUST Slide 14
1 bit had been used to allocate records 1 Brighton, A-217, 750 2 2 bits had been used to allocate records 2 Downtown, A-101, 500 Downtown, A-101, 600 Example Mianus 1100 0111 1110 1101 1011 1111 0011 1010 Downtown 1010 0011 1010 0000 1100 0110 1001 1111 Mianus, A-215, 700 Note: directory size is doubled Department of Computer Science and Engineering, HKUST Slide 15
1 Brighton, A-217, 750 2 2 Insert: Perryridge, A-102, 400 1111 0001 0010 0100 1001 0011 0110 1101 Insert: Perryridge, A-201, 900 Bucket 2 overflows again! 1111 0001 0010 0100 1001 0011 0110 1101 Example Downtown, A-101, 500 Downtown, A-101, 600 Mianus, A-215, 700 Perryridge, A-102, 400 Department of Computer Science and Engineering, HKUST Slide 16
1 Brighton, A-217, 750 2 Downtown, A-101, 500 00 01 10 11 Downtown, A-101, 600 2 Mianus, A-215, 700 Perryridge, A-102, 400 3 Perryridge, A-201, 900 Redistribute overflow and new record Example 0 0 0 0 1 1 1 1 3 • Expand hash address table, renumber the entries by adding one more bit to the left • Bucket 10 becomes bucket 10x, where x could be 0 or 1 • Bucket 11 becomes bucket 110 and 111, because it is split; new bucket is added; local level is updated Department of Computer Science and Engineering, HKUST Slide 17
Mianus, A-215, 700 Perryridge, A-201, 900 Perryridge, A-102, 400 Perryridge, A-218, 700 Insert: Perryridge, A-218, 700 Bucket 3 overflows again! 1111 0001 0010 0100 1001 0011 0110 1101 Example 1 Brighton, A-217, 750 2 Downtown, A-101, 500 Downtown, A-101, 600 3 3 Department of Computer Science and Engineering, HKUST Slide 18
Mianus, A-215, 700 Perryridge, A-201, 900 Perryridge, A-102, 400 Perryridge, A-218, 700 Example 1 Brighton, A-217, 750 Redwood, A-222, 700 2 Downtown, A-101, 500 Downtown, A-101, 600 3 Round Hill, A-305, 350 3 Done! Department of Computer Science and Engineering, HKUST Slide 19
Updates in Extendable Hash Structure • When inserting a value, if the bucket is full after several splits (that is, i reaches some limits b) create an overflow bucket instead of splitting bucket entry table further. • To delete a key value, locate it in its bucket and remove it. The bucket itself can be removed if it because empty (with appropriate updates to the bucket address table). Coalescing of buckets and decreasing bucket address table size is also possible. Department of Computer Science and Engineering, HKUST Slide 20
Extendible Hashing is not Pure Hashing • Pure hashing maps a key value directly to the bucket where the record containing the key value can be found. • Extendible hashing maps a key value to the entry in the hash prefix table which contains a pointer to the bucket where the record containing the key value can be found. • The hash-prefix table can be considered as a complete binary tree; that is, extendible hashing is a combination of tree and hashing! 00 01 10 11 Department of Computer Science and Engineering, HKUST Slide 21