440 likes | 470 Views
15-121 Introduction to Data Structures. Lecture 9: Searching. Outline. The simplest method: serial search Binary search Open-address hashing Chained hashing. Search Algorithms. Whenever large amounts of data need to be accessed quickly, search algorithms are crucially involved.
E N D
15-121 Introduction to Data Structures Lecture 9: Searching Lecture 9: Searching
Outline • The simplest method: serial search • Binary search • Open-address hashing • Chained hashing Lecture 9: Searching
Search Algorithms Whenever large amounts of data need to be accessed quickly, search algorithms are crucially involved. Lecture 9: Searching
Search Algorithms Lie at the heart of many computer technologies. To name a few: • Databases • Information retrieval applications • Web infrastructure (file systems, domain name servers, etc.) • String searching for patterns Lecture 9: Searching
Search Algorithms: Two Broad Categories • Searching a static database • Accessing indexed Web pages • Finding a file on disk • Evaluating a dynamically changing set of hypotheses • Computer chess (search for a move) • Speech recognition (search for text given speech) We’ll be concerned with the first Lecture 9: Searching
The Simplest Search: Serial Lookup • Items are stored in an array or list. • To search for an item x: • Start at the beginning of the list • Compare the current item to x • If unequal, proceed to next item Lecture 9: Searching
Pseudocode for Serial Search // Find x in an array a of length n int i=0; boolean found = false; while ((i < n) && !found) { if (a[i] == x) found = true; else i++; } if (found) ... Lecture 9: Searching
Analysis for Serial Search • Best case: Requires one array access: Θ(1) • Worst case: Requires n array accesses: Θ(n) • Average case: To access an item, assuming position is random (uniform):(1+2+3+...+n)/n = n(n+1)/2n = (n+1)/2 = Θ(n) Lecture 9: Searching
A Useful Combinatorial Identity 1+2+3+…+n = n(n+1)/2 Why? Algebraic Proof in Main Visual Counting Lecture 9: Searching
Visual Counting n*n Lecture 9: Searching
Visual Counting n Lecture 9: Searching
Visual Counting n*n - n Lecture 9: Searching
Visual Counting (n*n - n)/2 + n = n(n+1)/2 Lecture 9: Searching
Binary Search • Can be used whenever the data are totally ordered -- e.g., the integers. All elements are comparable. • Requires sorting in advance, and storing in an array • One of the simplest to implement, often “fast enough” • Can be tricky to handle “boundary cases” • This a classic divide-and-conquer algorithm. Lecture 9: Searching
Idea of Binary Search • Closely related to the natural algorithm we use to look up a word in a dictionary • Open to the middle • If target comes before all words on the page, search in left half of book • Otherwise, search in right half. Lecture 9: Searching
Interface for Binary Search int search(int [] a, int first, int size, int target) • Parameters: • int [] a: array to be searched over • Search over a[first,first+1,...,first+size-1] • Precondition: • array is sorted in increasing order • first >= 0 Lecture 9: Searching
Implementation int search (int [] a, int start, int size, int target) { if (size <= 0) return -1; else { int middle = start + size/2; if (a[middle] == target) return middle; else if (target < a[middle]) return search(a, start, size/2, target); else return search(a, middle+1, size/2, target); } } Lecture 9: Searching
Implementation Where’s the error?? Suppose size is odd. Are new sizes correct? Suppose size is even. Are new sizes correct? Lecture 9: Searching
Implementation int search (int [] a, int first, int size, int target) { if (size <= 0) return -1; else { int middle = first + size/2; if (a[middle] == target) return middle; else if (target < a[middle]) return search(a, first, size/2, target); else return search(a, middle+1, (size-1)/2, target); } } Lecture 9: Searching
Boundary Cases • Binary search is sometimes tricky to get right. • A common source of bugs. • Test cases are not always helpful for checking correctness of code. • How many test cases would our first implementation solve? Lecture 9: Searching
Binary Search with Other Data Structures • Can binary search be implemented using linked lists rather than arrays? • Are there any other data structures that could be used? Lecture 9: Searching
Analysis of Binary Search • Recursively dividing up array in half represents data as a full binary tree. • Consider the simplest case -- array of size n = 2k -1, complete binary tree. • Take away one and divide by 2. • New Size = 2k-1 - 1. • We can only do that k times and k = Lg(n+1). • Thus, worst case involves Θ(log n) operations. Lecture 9: Searching
Average Case • A complete binary tree with k leaves has k-1 internal nodes. • So, about half of the n data elements require Θ(log n) operations to find. • Thus, assuming uniform distribution on target elements, average cost is also Θ(log n). Lecture 9: Searching
Binary Search is Limited When we have a large number of items that will be accessed in part of the program, where efficiency is crucial, binary search may be too slow. Lecture 9: Searching
Try to guess more precisely where the key is located in the interval. Generalize middle = first + size/2 (key – a[first]) middle = ------------------------------ * size (a[first+size-1] – a[first]) Improving Binary Search Lecture 9: Searching
Interpolation Search • This modifies method is called interpolation search. • Uses fewer than log(log(N)) comparisons in the average caes. • But uses Θ(N) in the worst case. • For analysis, see Perl, Ital, Avni “Interpolation Search – A Log Log N search” CACM 21 (1978) Pages 550 – 553 • Is log (log (N)) better that log (N)? Lecture 9: Searching
Comparing Log N to Log(Log N) • Suppose N = 2^100 Log N = 100 Log (Log N) = Log (100) = 6.65 • Suppose N = 2^(2^100) Log N = 2^100 Log (Log N) = Log 2^100 = 100 Lecture 9: Searching
Comparing Log(N) to Log(Log N) • Or, by taking limits… • Lim Log(Log(n)) / Log(n) n->∞ is of the form inf. / inf. • Apply L’Hopital and take derivatives. • Lim 1/(Log N) * 1/n n->∞ -------------------- = 0 1/n Lecture 9: Searching
Hashing • Fortunately, we can often do better • Hashing is a technique that where the access time can be O(1) rather than O(log n) Lecture 9: Searching
Open Address Hashing The basic technique: • Items are stored in an array of size N • The preferred position in the array is computed using a hash function of the item’s key • When adding an item, if the preferred position is occupied, the next open position in the array is used instead. Lecture 9: Searching
Open Address Hashing Main’s presentation for Chapter 11 Lecture 9: Searching
A Basic Hash Table • We keep arrays for the keys and data, and a bit indicating whether a given position has been occupied private class Table { private int numItems; private Object[] keys; private Object[] data; private boolean[] hasBeenUsed; .... } Lecture 9: Searching
The Hash Function • We can use the built in hashCode() method that Java provides private int hash (Object key) { return Math.abs(key.hashCode()) % data.length; } Lecture 9: Searching
Calculating the Index // If found return value is index of key private int findIndex(Object key) { int count=0; int i=hash(key); while ((count < data.length) && (hasBeenUsed[i])) { if (key.equals(keys[i])) return i; i = nextIndex(i); count++; } return -1; } Lecture 9: Searching
Inserting an Item public Object put (Object key, Object element) { int index = findIndex(key); if (index != -1) { Object answer = data[index]; data[index] = element; return answer; } else if (numItems < data.length) { .... Lecture 9: Searching
Inserting an Item public Object put (Object key, Object element) { ... else if (numItems < data.length) { index = hash(key); while (keys[index] != null) index = nextIndex(index); keys[index] = key; data[index] = element; hasBeenUsed[index] = true; numItems++; return null; } else throw new IllegalStateException(“Table full”) .... Lecture 9: Searching
Two Hashes are Better than One • Collisions can result in long stretches of positions with keys not in their “preferred” position • This is called clustering • To address this problem, when a collision results we jump a “random” number of positions, using a second hash function Lecture 9: Searching
Double Hashing • Find the first position using hash1(key) • If there’s a collision, step through the array in steps of size hash2(key): i = (i + hash2(key)) % data.length • To avoid cycles, hash2(key) and the length of the array must be relatively prime (no common factors) Lecture 9: Searching
Double Hashing • Knuth’s technique to avoid cycles: • Choose the length of the array so that both data.length and data.length-2 are prime hash1(key) = Math.abs(key.hashCode()) % length hash2(key) = 1 + (Math.abs(key.hashCode()) % (length-1) Lecture 9: Searching
Issues with O-A Hashing • Each array cell holds only one element • Collisions and clustering can degrade performance • Once the array is full, no more elements can be added, unless we: • create a new array with the right size and hash functions • re-hash the original elements Lecture 9: Searching
Chained Hashing • Each array cell can hold more than one element of the hash table • Hash the key of each element to obtain the array index • When a collision happens, the element is still placed at the original hash index • How is this handled? Lecture 9: Searching
Answer • Each array location must be implemented with a data structure that can hold a group of elements with the same hash index • Most common approach • each array location stores the head of a linked list • items in the list all have the same has index Lecture 9: Searching
element element element element key key key key link link link link Chained Hashing table … [0] [1] [2] [3] Any number of elements can beadded to the table without a need to rehash Lecture 9: Searching
Java HashMap Lecture 9: Searching