Hashing: Collision Resolution Schemes

Hashing: Collision Resolution Schemes • Collision Resolution Techniques • Introduction to Separate Chaining • Collision Resolution using Separate Chaining • Introduction to Collision Resolution using Open Addressing

Collision Resolution Techniques • There are three broad ways of collision resolution: • 1.Separate Chaining: A linked list-based implementation. • 2.Open Addressing: Array-based implementation. (i) Linear probing (linear search) (ii) Quadratic probing (nonlinear search) (iii) Random increments/decrements (iv) Rehashing (double hashing) • 3. Buckets methods: Usually a combination of (1) & (2)

Introduction to Separate Chaining • The hash table is implemented as an array of linked lists. • Inserting an item,r, at indexiis simply insertion into the linked list at positioni. • Synonyms are chained in the same linked list. • Retrieval of an item,r, with hash address,i, is simply retrieval from the linked list at positioni. • Deletion of an item,r, with hash address,i, is simply deletingrfrom the linked list at positioni.

Separate Chaining with String Keys • Recall that search keys can be numbers, strings or some other object. • The following Java method implements such technique public static int hash(String key, int tableSize) { int hashVal = 0; for (int i = 0; i < key.length(); i++) { hashVal += key.charAt(i); } return hashVal % tableSize; } • The following class which describes commodity items class CommodityItem { String name; // commodity name int quantity; // commodity quantity needed double price; // commodity price }

Example 1: Separate Chaining • Devise an appropriate hash function and use it to load the information about the following commodity items into a hash table of size 13 using separate chaining. onion 1 10.0 tomato 1 8.50 cabbage 3 3.50 carrot 1 5.50 okra 1 6.50 melon 2 10.0 potato 2 7.50 Banana 3 4.0 olive 2 15.0 salt 2 2.50 cucumber 3 4.50 mushroom 3 5.50 orange 2 3.00

Example 1: Separate Chaining (cont'd) 0 1 2 3 4 5 6 7 8 9 10 11 12

Introduction to Open Addressing • In this method the entries are placed inside the array itself. • The probe sequence is essentially a sequence of functions {h0, h1, h2, …, hn-1} where, hi: K -> {0, 1, …, n-1 } • To insert item r, we examine array locations h0(r), h1(r), h2(r), ..., • Similarly, to find item r, we examine the same sequence of locations in the same order.

Introduction to Open Addressing (cont'd) • The most common probe sequences are of the form hi(r) = (h(r) + c(i)) mod n, i = 0, 1, …, n-1. • The function c(i) is required to have the following two properties: • Property 1: c(0) = 0. • Property 2: The set of values {c(0) mod n, c(1) mod n, c(2) mod n, …, c(n-1) mod n} must contain every integer between 0 and n-1 inclusive.

Open Addressing: Linear Probing • Linear Probe: Here the function c(i) is a linear function in i: c(i) = ai + b • Property 1 requires that c(0) = 0. Therefore, b must be zero. • For c(i) = ai to satisfy Property 2, a and n must be relatively prime. • The linear probing sequence that is usually used is hi (r)= (h(r) + i) mod n, i=0,1,2,…, n-1 • Insert record at first empty slot and if no empty slot is found then the hash table is full and insertion fails.

Example 2: Linear Probing • Use the hash function h(r) = r.id % 13 to load the following records into an array of size 13. Al-Otaibi Ziyad 1.73 985926 Al-Turki, Musab Ahmad Bakeer 1.60 970876 Al-Saegh, Radha Mahdi 1.58 980962 Al-Shahrani, Adel Saad 1.80 986074 Al-Awami, Louai Adnan Muhammad 1.73 970728 Al-Amer, Yousuf Jauwad 1.66 994593 Al-Helal, Husain Ali AbdulMohsen 1.70 996321 Then insert the following records using linear probing to resolve collisions, if any. Al-Najjar, Khaled Ziyad 1.69 987615 Al-Ali, Amr Ali Zaid 1.79 987630 Al-Ramadi, Husam Yahya 1.58 987602

Example 2: Introduction to Hashing (cont'd) 0 1 2 3 4 5 6 7 8 9 10 11 12 Husain Yousuf Louai Ziyad Khalid Radha Amr Musab Adel Husam

Linear Probing: Some Notes • Notice from this table that a large cluster has already been formed. • In general, empty cells following the cluster have higher chance of being hashed into. • The probability of taking longer probe sequences is much higher with clusters. • This is one disadvantage of linear probing. Other methods attempt to improve on this.

Introduction to Retrieval & Deletion • Retrieval: To search for a record we: • Calculate its hash value. • Check that location of the array for the record. · If found, return the record. · If not, keep searching until you find the record or you reach an empty table location. • Attempting to retrieve a non-existent record is very expensive. • Deletion: • In open addressing, where a record is stored is not necessarily its home position. • We cannot just set the location of a deleted record to empty. • A special flag or key value is needed to mark deleted records locations.

Example 3: Retrieval & Deletion • Consider the following hash table constructed in Example 2: 0 1 2 3 4 5 6 7 8 9 10 11 12 Husain Yousuf Louai Ziyad Khalid Radha Amr Musab Adel Husam Delete Khalid's record (id 987615) and then retrieve the records for Amr and then that of Husam.

Example 3: Retrieval & Deletion 0 1 2 3 4 5 6 7 8 9 10 11 12 ? Husain Yousuf Louai Ziyad Radha Amr Musab Adel Husam

Exercises 1.Given that, c(i) = a*i, for c(i) in linear probing, we discussed that this equation satisfies Property 2 only when a and n are relatively prime. Explain what the requirement of being relatively prime means in simple plain language. 2.Consider the general probe sequence, hi (r) = (h(r) + c(i))mod n. Are we sure that if c(i) satisfies Property 2, then hi(r) will cover all n hash table locations, 0,1,...,n-1? Explain. 3.Suppose you are given k records to be loaded into a hash table of size n, with k < n using linear probing. Does the order in which these records are loaded matter for retrieval and insertion? Explain. 4.A prime number is always the best choice of a hash table size. Is this statement true of false? Justify your answer either way.

Hashing: Collision Resolution Schemes

Hashing: Collision Resolution Schemes

Presentation Transcript

CSE 326: Data Structures Part 5 Hashing

Indexing and Hashing

Reading and Review Chapter 12: Indexing and Hashing

Principles of Database Management Systems 4.2: Hashing Techniques

Anaphora Resolution

Topic 8: Kinetics and Equilibrium

Physics for Games

BASICS OF MANAGEMENT

Hashing

Chapter 11: Indexing and Hashing

Directional Resolution: The Davis-Putnam Procedure, Revisited

LAHORE RESOLUTION

Pipeline Design Problems

House of Representatives Budget Resolution for FY2013

Contents

Name Resolution

Defensive Driving

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 – DB Applications

Indexing