450 likes | 780 Views
Searching Algorithms. Briana B. Morrison With thanks to Dr. Hung. Topics. The searching problem Using Brute Force Lower Bounds Interpolation Search Searching in Trees Hashing Finding the k-th largest key. The searching problem .
E N D
SearchingAlgorithms Briana B. Morrison With thanks to Dr. Hung
Topics • The searching problem • Using Brute Force • Lower Bounds • Interpolation Search • Searching in Trees • Hashing • Finding the k-th largest key
The searching problem • Problem is to retrieve an entire record based on the value of some key Find an index i such that x = S[i] if x equals one of the keys; If x does not equal one of the keys, report failure
Brute Force Brute force is a straightforward approach usually based on problem statement and definitions. Examples: • Computing an (a > 0, n a nonnegative integer) • Computing n! • Multiply two n by n matrices • Selection sort • Sequential search
String matching • pattern: a string of m characters to search for • text: a (long) string of n characters to search in • Brute force algorithm: • Align pattern at beginning of text • moving from left to right, compare each character of pattern to the corresponding character in text until • all characters are found to match (successful search); or • a mismatch is detected • while pattern is not found and the text is not yet exhausted, realign pattern one position to the right and repeat step 2.
Brute force string matching – Examples: • Pattern: 001011 Text: 10010101101001100101111010 • Pattern: happy Text: It is never too late to have a happy childhood. Number of comparisons: Efficiency:
Brute force strengths and weaknesses • Strengths: • wide applicability • simplicity • yields reasonable algorithms for some important problems • searching • string matching • matrix multiplication • yields standard algorithms for simple computational tasks • sum/product of n numbers • finding max/min in a list
Brute force strengths and weaknesses … • Weaknesses: • rarely yields efficient algorithms • some brute force algorithms unacceptably slow • not as constructive/creative as some other design techniques
Exhaustive search • A brute force solution to a problem involving search for an element with a special property, usually among combinatorial objects such as permutations, combinations, or subsets of a set.
Exhaustive search … • Method: • construct a way of listing all potential solutions to the problem in a systematic manner • all solutions are eventually listed • no solution is repeated • Evaluate solutions one by one, perhaps disqualifying infeasible ones and keeping track of the best one found so far • When search ends, announce the winner
Final comments: • Exhaustive search algorithms run in a realistic amount of time only on very small instances • In many cases there are much better alternatives! • In some cases exhaustive search (or variation) is the only known solution
Sequential (Linear) Searching • Sequential search – starts at the beginning and examines each element in turn. • If we know the array is sorted and we know the search value, we can start the search at the most efficient end. • If the array is sorted, we can stop the search when the condition is no longer valid (that is, the elements are either smaller or larger than the search value).
Sequential Search Algorithm SequentialSearch (A[0 … n - 1], k) //The algorithm implements sequential search with a search key as a sentinel //Input: An array A[0 … n - 1] of elements and a search key k //Output: the position of the first element in A[0 … n - 1] whose value is equal to k or -1 if no such element is found i ← 0 while A[i] ≠ k do i ← i + 1 if i < n return i else return -1
Sequential Search • Sequential Search: examine each piece of data until the correct one is found. • 1. Worst case: Order O(n). • 2. Best case: O(1) • 3. Average case: O(n/2). • Thus, we say sequential search is O(n).
Binary Search • Search requires the following steps: 1. Inspect the middle item of an array of size N. 2. Inspect the middle of an array of size N/2 3. Inspect the middle item of an array of size N/power(2,2) and so on until N/power(2,k) = 1. • This implies k = log2N • k is the number of partitions.
Binary Search • Requires that the array be sorted • Rather than start at either end, binary searches split the array in half and works only with the half that may contain the value • This action of dividing continues until the desired value is found or the remaining values are either smaller or larger than the search value.
Binary Search • BinarySearch (List, target, N) • //list: the elements to be searched • //target: the value being searched for • //N: the number of elements in the list • Start = 1 • End = N • While start <= end do • Middle = (start + end) /2 • Select (compare (list [middle], target)) from • Case -1: strat = middle + 1 • Case 0: return middle • Case 1: end = middle – 1 • End select • End while • Return 0
Binary Search • Suppose that the data set is n = 2k - 1sorted items. • Each time examine middle item. If larger, look left, if smaller look right. • Second ‘chunk’ to consider is n/2 in size, third is n/4, fourth is n/8, etc. • Worst case, examine k chunks. n = 2k – 1 so, k = log2(n + 1). (Decision binary tree: one comparison on each level) • Best case, O(1). (Found at n/2). • Thus the algorithm is O(log(n)). • Extension to non-power of 2 data sets is easy.
Binary Search • Best case : O(1) • Worst case : O(log2N) (why?) • Average Case : O(log2N)/2 = O(log2N) (why?)
Lower Bounds on Searching • Consider only Comparisons of keys • Associate a decision tree with every deterministic algorithm that searches for a key x in an array of n keys. Each leaf represents a point at which the algorithm stops
Lower Bounds • Worst case number of comparisons is the number of nodes in the longest path from the root to a leaf in the binary tree. • This number is ? • Establish a lower bound on the depth of the binary tree
Lower Bounds… d≤ lg (n) n ≤ 1 + 2 + 22 + 23 + … + 2d, one root, at most 2 nodes with depth 1, etc. 2d nodes with depth d n ≤ 2d+1– 1, n < 2d+1 lg n < d + 1 lg n ≤ d
Average Case • Binary Search’s average-case performance is not much better than its worst case.
Interpolation Search • Reasonable to assume that he keys are close to being evenly distributed between the first one and the last one (and sorted). • Instead of checking the middle, check where we would expect to find x:
Searching in Trees • Binary Search Tree • Best Case ? • Worst Case ? • Average Case? • B-Trees • Best Case ? • Worst Case ? • Average Case? • Advantage? Disadvantage? A(n) ≈ 1.38 lg n
Balanced trees: AVL trees • For every node, difference in height between left and right subtree is at most 1. • AVL property is maintained through rotations, each time the tree becomes unbalanced. • lg n≤h≤ 1.4404 lg (n + 2) - 1.3277 average: 1.01 lg n + 0.1 for large n
Balanced trees: AVL trees • Disadvantage: needs extra storage for maintaining node balance. • A similar idea: red-black trees (height of subtrees is allowed to differ by up to a factor of 2).
AVL tree rotations • Small examples: • 1, 2, 3 • 3, 2, 1 • 1, 3, 2 • 3, 1, 2 • Larger example: 4, 5, 7, 2, 1, 3, 6 • See figures 6.4, 6.5 for general cases of rotations;
Balance factor • Algorithm maintains balance factor for each node. For example:
Hashing … • Hashing • Hash Table • Hash Function • Hash Address • Collisions • Open Hashing (Separate Chaining) • Closed Hashing (Open Addressing) (example: Linear Probing – checks the cell following the one where the collision occurs) – implies that the table size m must be at least as large as the number of keys n.
Hashing A: 1 B: 2 C:3 D: 4 ……………….. Z:26 Hash function: key mod 13
Hashing • Hash function distributes n keys among m cells of the hash table evenly, each list will be about n/m keys long. load factor: α = n/m • Efficiency of hashing (Open Hashing): • Efficiency of hashing (Closed Hashing):
Hashing • Exercise: For the input 30, 20, 56, 75, 31, 19 and hash function h(K) = K mod 11 • (a) Construct the open hash table. • (b) Find the largest number of key comparisons in a successful search in this table. • (c) Find the average number of key comparisons in a successful search in this table.
Hashing • Exercise: For the input 30, 20, 56, 75, 31, 19 and hash function h(K) = K mod 11 • (a) Construct the closed hash table. • (b) Find the largest number of key comparisons in a successful search in this table. • (c) Find the average number of key comparisons in a successful search in this table.
Finding Largest Key public static keytype find_largest (int n, keytype[ ] S) { index i; keytype large = S[1]; for (i = 2; i <= n; i++) if (S[i] > large) large = S[i]; return large; } T(n) = n – 1;
Finding Both Smallest & Largest Keys public static void find_both (int n, keytype[ ] S, bothrec both) { index i; both.small = S[1]; both.larget = S[1]; for (i = 2; i <= n; i++) if (S[i] < both.small) both.small = S[i]; else if (S[i] > both.large) both.large = S[i]; } Better performance than finding each separately. Why? Worst case? W(n) = 2(n-1)
Intro to Adversary Arguments • Adversary’s goal is to make an algorithm work as hard as possible. Makes a decision that will keep the algorithm going as long as possible. Selects worst possible input set. • Adversary forces the algorithm to do the basic instruction f(n) time, then f(n) is lower bound on the worst-case complexity
Finding 2nd-Largest Key • Find largest key, eliminate, then find 2nd largest • Sort and return 2nd from end • Performance?
Finding kth-Smallest Key • Assume keys are distinct • Sort the keys and return kth key • Use QuickSort and partition until you find the value for the kth slot W(n) = (n(n-1))/2 A(n) ≈ 3n