500 likes | 513 Views
Computing Science 1P. Lecture 21: Wednesday 18 th April. Simon Gay Department of Computing Science University of Glasgow. 2006/07. What's coming up?. Wed 18 th April (today): lecture Fri 20 th April: lecture Mon 23 rd April, 12.00-14.00: FPP demo session; PRIZES!
E N D
Computing Science 1P Lecture 21: Wednesday 18th April Simon Gay Department of Computing Science University of Glasgow 2006/07
What's coming up? Wed 18th April (today): lecture Fri 20th April:lecture Mon 23rd April, 12.00-14.00: FPP demo session; PRIZES! Mon 23rd – Wed 25th April: labs: exam preparation / lab exam preparation Wed 25th April: lecture / tutorial Friday 27th April: lecture: revision / questions with Peter Saffrey Mon 30th April – Wed 2nd May:Lab Exam Wed 2nd May: No lecture Fri 4th May: No lecture THE END Computing Science 1P Lecture 21 - Simon Gay
Lab Exam 2 As you know, there will be a second lab exam, during the week beginning 30th April. It is worth 10% of the module mark. The question has been available since Monday afternoon. In the first lab exam, although there were many very good submissions, a worrying number of submissions showed very little evidence of advance preparation. You can prepare for the lab exam in any way you want to, including asking a friend to explain the solution to you. On the day, of course, you are on your own. Trying to solve the problem from scratch in the exam is just making life difficult for yourself. Computing Science 1P Lecture 21 - Simon Gay
Algorithms We have looked at algorithms for sorting; we saw that choosing a better algorithm can have a dramatic effect on the efficiency of a program. Now let's consider searching, another basic computing task. The general problem: find a desired item of data in a collection. Example: in a collection of (name,address) pairs, find a particular person's address. Computing Science 1P Lecture 21 - Simon Gay
Searching in unstructured data Imagine that we have a list of (key,value) pairs and we do not know anything about the order. We can easily define: def find(key,data): for i in data: if i[0] == key: return i[1] raise "KeyNotFound",key Computing Science 1P Lecture 21 - Simon Gay
Searching in unstructured data What can we say about the time taken by find ? Just as for sorting, the relevant measure is the number of comparisons. Clearly it is possible that the key we are looking for is at the end of the list. In that case we have to compare the given key with every key in the list. If we imagine testing find repeatedly with a large number of random lists, on average it will have to search half way along the list. When analysing algorithms, sometimes we talk about the average case and sometimes the worst case. In this situation they are both the same: order n, where n is the length of the list. Computing Science 1P Lecture 21 - Simon Gay
Searching in unstructured data It's obvious that we can't do better than order n for searching in an unstructured list, because we can't avoid the possibility that the desired key is at the end. Remarkably, there is an algorithm for quantum computers which only takes square root of n operations to search in an unstructured list. However, quantum computers of a useful size have not yet been built. To find out more, look up Grover's algorithm. But let's stick to conventional algorithms… Computing Science 1P Lecture 21 - Simon Gay
More efficient search If we can't improve the algorithm for search in an unstructured list, the only alternative is to change the data structure: don't use an unstructured list! The first idea is quite simple: use an ordered list instead. In other words, put the data in the list in such a way that the keys are in order. Often this means alphabetical order or numerical order, but other more complex orders could be defined. Example: in a dictionary, the words are in alphabetical order, and we can take advantage of this to find words quickly. Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper cat garage kestrel lemon door handle Binary search 0 1 Search for the key: cat 2 3 4 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger jumper fireman kestrel cat garage lemon handle door Binary search 0 1 Search for the key: cat 2 3 4 It could be anywhere in the list. 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
iguana android ending badger fireman jumper cat garage kestrel door handle lemon Binary search 0 1 Search for the key: cat 2 3 4 It could be anywhere in the list. 5 6 The list has length 12. Divide it by 2 and look at position 6. 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper kestrel cat garage door handle lemon Binary search 0 1 Search for the key: cat 2 3 4 It could be anywhere in the list. 5 6 The list has length 12. Divide it by 2 and look at position 6. 7 8 9 cat < garage 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger jumper fireman kestrel cat garage lemon handle door Binary search 0 1 Search for the key: cat 2 3 4 Because the list is ordered, we now know that cat must be before garage, i.e. it is in the first half of the list. 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger jumper fireman kestrel cat garage lemon handle door Binary search 0 1 Search for the key: cat 2 3 4 Because the list is ordered, we now know that cat must be before garage, i.e. it is in the first half of the list. 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger jumper fireman kestrel cat garage lemon handle door Binary search 0 1 Search for the key: cat 2 3 4 Now repeat, searching in a list of length 6. 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana jumper badger fireman cat garage kestrel door handle lemon Binary search 0 1 Search for the key: cat 2 3 4 Now repeat, searching in a list of length 6. 5 6 7 Divide by 2 and look at position 3. 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper kestrel cat garage door handle lemon Binary search 0 1 Search for the key: cat 2 3 4 Now repeat, searching in a list of length 6. 5 6 7 Divide by 2 and look at position 3. 8 9 cat < door 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger jumper fireman kestrel cat garage lemon handle door Binary search 0 1 Search for the key: cat 2 3 4 We now know that cat must be before door. 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger jumper fireman kestrel cat garage lemon handle door Binary search 0 1 Search for the key: cat 2 3 4 We now know that cat must be before door. 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger jumper fireman kestrel cat garage lemon handle door Binary search 0 1 Search for the key: cat 2 3 4 Now repeat, searching in a list of length 3. 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana jumper badger fireman cat garage kestrel door handle lemon Binary search 0 1 Search for the key: cat 2 3 4 Now repeat, searching in a list of length 3. 5 6 7 Divide by 2 and look at position 1. 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper kestrel cat garage door handle lemon Binary search 0 1 Search for the key: cat 2 3 4 Now repeat, searching in a list of length 3. 5 6 7 Divide by 2 and look at position 1. 8 9 cat > badger 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger jumper fireman kestrel cat garage lemon handle door Binary search 0 1 Search for the key: cat 2 3 4 We now know that cat must be after badger. 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
iguana android ending badger fireman jumper cat garage kestrel door handle lemon Binary search 0 1 Search for the key: cat 2 3 4 We now know that cat must be after badger. 5 6 We have narrowed down the possible position of cat to just one place. And in fact cat is there, so we have found it. 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper cat garage kestrel lemon door handle Binary search 0 1 Search for the key: cat 2 3 4 We now know that cat must be after badger. 5 6 We have narrowed down the possible position of cat to just one place. And in fact cat is there, so we have found it. 7 8 9 10 If a different word is there, then cat is not in the list. 11 Computing Science 1P Lecture 21 - Simon Gay
Binary search The idea of binary search is very simple, but implementing it correctly requires care: there are many possibilities for "off by one" errors. Searching in a dictionary is often used as an example of binary search, but we don't really use dictionaries in exactly this way. Usually we flick through the pages quickly to find the right letter, then do something similar to binary search. A typical dictionary has extra structure to support this process (e.g. words in the page headers; thumbholes for indexing). Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper cat garage kestrel lemon door handle Another example 0 1 Search for the key: handle 2 3 4 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper cat garage kestrel lemon door handle Another example 0 1 Search for the key: handle 2 3 4 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper cat garage kestrel lemon door handle Another example 0 1 Search for the key: handle 2 3 4 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper cat garage kestrel lemon door handle Another example 0 1 Search for the key: handle 2 3 4 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper cat garage kestrel lemon door handle Another example 0 1 Search for the key: handle 2 3 4 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
iguana android ending badger jumper fireman kestrel cat garage lemon door handle Another example 0 1 Search for the key: handle 2 3 4 5 6 7 We could stop here, but if we follow the algorithm strictly, we continue dividing the region in two 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper cat garage kestrel lemon door handle Another example 0 1 Search for the key: handle 2 3 4 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
android ending iguana badger fireman jumper cat garage kestrel lemon door handle Another example 0 1 Search for the key: handle 2 3 4 5 6 7 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
iguana android ending badger jumper fireman kestrel cat garage lemon door handle Another example 0 1 Search for the key: handle 2 3 4 5 6 7 Now we can certainly stop 8 9 10 11 Computing Science 1P Lecture 21 - Simon Gay
Analysing binary search Remember that we are interested in the number of comparisons. Suppose that we are searching in a list of length n. We compare the middle item with the search key. The result might tell us we have found the key, but in general it narrows down the region of the list in which we are searching. The possible region of the list is now half the size it was. Computing Science 1P Lecture 21 - Simon Gay
Analysing binary search We keep halving the size of the region, until we narrow it down to a single position in which the key should be found. How many times do we have to halve the size? n = 16: 8, 4, 2, 1 4 comparisons n = 64: 32, 16, 8, 4, 2, 1 6 comparisons It is the logarithm of n to base 2, i.e. the power of 2 which gives n. Binary search is an order log n algorithm. Computing Science 1P Lecture 21 - Simon Gay
Analysing binary search We can compare the efficiency of an order n algorithm with that of an order log n algorithm: Computing Science 1P Lecture 21 - Simon Gay
Implementing binary search def find(key,data): lower = 0 upper = len(data)-1 length = upper - lower + 1 while length > 1: midpoint = lower + length/2 if key < data[midpoint]: upper = midpoint - 1 else: lower = midpoint length = upper - lower + 1 if key == data[lower]: return lower else: raise "KeyNotFound",key Computing Science 1P Lecture 21 - Simon Gay
iguana android ending jumper badger fireman kestrel cat garage handle lemon door Implementing binary search 0 def find(key,data): lower = 0 upper = len(data)-1 length = upper - lower + 1 while length > 1: midpoint = lower + length/2 if key < data[midpoint]: upper = midpoint - 1 else: lower = midpoint length = upper - lower + 1 if key == data[lower]: return lower else: raise "KeyNotFound",key 1 2 3 4 5 lower 6 7 8 midpoint 9 10 11 upper Computing Science 1P Lecture 21 - Simon Gay
Termination When writing programs with loops, we have to be sure that they terminate, i.e. eventually stop. In almost all of our previous programs, it has been obvious that loops terminate. In a for loop, the number of iterations is known before we start, e.g. for x in range(10) In a while loop, the condition can be anything, but we have always used a simple structure: i = 0 while i < 10: # code inside the loop, not changing i i = i + 1 Computing Science 1P Lecture 21 - Simon Gay
Termination Binary search uses a while loop with a more complex structure: length = upper - lower + 1 while length > 1: midpoint = lower + length/2 if key < data[midpoint]: upper = midpoint - 1 else: lower = midpoint length = upper - lower + 1 Let's prove that eventually length <= 1, so that the loop terminates. Computing Science 1P Lecture 21 - Simon Gay
Proving termination Consider one iteration of the loop. At the beginning we have values lower, upper and length. At the end of the body of the loop we have new values lower',upper' and length'. We will prove that length' < length . Therefore as we go round the loop repeatedly, length gets smaller and smaller. It is always an integer value, so eventually it must reach 1 or less. Computing Science 1P Lecture 21 - Simon Gay
Proving termination At the top of the loop we have values lower, upper. We have length = upper – lower + 1 Assume that length > 1, so that we go into the loop. At the bottom of the loop we have new values lower' , upper' and we have a new value length' = upper' – lower' + 1 we also have midpoint = lower + length/2 Now we consider the possible ways of calculating lower', upper' Computing Science 1P Lecture 21 - Simon Gay
Proving termination Case 1: we take the first branch of the if statement. upper' = midpoint – 1 lower' = lower length' = upper' – lower' + 1 = midpoint – 1 – lower + 1 = midpoint – lower = lower + length/2 – lower = length/2 < length because length > 1 Computing Science 1P Lecture 21 - Simon Gay
Proving termination Case 2: we take the second branch of the if statement. upper' = upper lower' = midpoint length' = upper' – lower' + 1 = upper – midpoint + 1 = upper – (lower + length/2) + 1 = upper – lower – length/2 + 1 = length – length/2 < length because length > 1 Computing Science 1P Lecture 21 - Simon Gay
Proving termination We have proved that whichever path we take through the body of the while loop, length decreases. Therefore the loop must terminate. With further calculation of a similar kind (exercise!) we can prove that when the loop terminates, length = 1 (not < 1), meaning that we really have identified one location in the list where the key should be found if it is present at all. Computing Science 1P Lecture 21 - Simon Gay
Refining binary search It might turn out that when we look at the midpoint of the list, the key we want happens to be there. We might as well take advantage of that case: while length > 1: midpoint = lower + length/2 if key < data[midpoint]: upper = midpoint - 1 elif key > data[midpoint]: lower = midpoint else: return midpoint but notice that we have introduced an extra comparison. What can we do about this? Computing Science 1P Lecture 21 - Simon Gay
The function cmp The problem is that when we compare two values, there are three possible results: equal, first one smaller, second one smaller The comparisons < <= > >= == return a boolean result, so they only tell us one of two possible results. To solve this problem, Python provides the function cmp . cmp(a,b) returns 0 if a == b -1 if a < b 1 if a > b Computing Science 1P Lecture 21 - Simon Gay
Using cmp while length > 1: midpoint = lower + length/2 r = cmp(key,data[midpoint]) if r == -1: upper = midpoint - 1 elif r == 1: lower = midpoint else: return midpoint Computing Science 1P Lecture 21 - Simon Gay