360 likes | 491 Views
LING 408/508: Computational Techniques for Linguists. Lecture 9 9/10/2012. Outline. Long HW # 1 answers. #1c.
E N D
LING 408/508: Computational Techniques for Linguists Lecture 9 9/10/2012
Outline • Long HW #1 answers
#1c • If I leave my house at 6:52 am and run 1 mile at an easy pace (8:15 per mile), then 3 miles at tempo (7:12 per mile) and 1 mile at easy pace again, what time do I get home for breakfast?
#1c • Need to convert to a common unit • 6:52 a.m. 6 hours, 52 minutes, 0 seconds • 8:15 0 hours, 8 minutes, 15 seconds • 7:12 0 hours, 7 minutes, 12 seconds • Choose seconds (the unit that is the least common denominator) • Since there are three times to convert, write a function def convert_to_seconds(h, m, s): return 60*60*h + 60*m + s
#1c • If I leave my house at 6:52 am and run 1 mile at an easy pace (8:15 per mile), then 3 miles at tempo (7:12 per mile) and 1 mile at easy pace again, what time do I get home for breakfast? starttime = convert_to_seconds(6, 52, 0) easy = convert_to_seconds(0, 8, 15) tempo = convert_to_seconds(0, 7, 12) endtime = starttime + 1*easy + 3*tempo + 1*easy
#1c # now convert seconds back into h, m, s # (endtime % 3600) is the number seconds # remaining after removing seconds for hours h = endtime // 3600 m = (endtime % 3600) // 60 s = (endtime % 3600) % 60 print(h, 'hours') # 7 hours print(m, 'minutes') # 30 minutes print(s, 'seconds') # 6 seconds
#1c: entire program def convert_to_seconds(h, m, s): return 60*60*h + 60*m + s starttime = convert_to_seconds(6, 52, 0) easy = convert_to_seconds(0, 8, 15) tempo = convert_to_seconds(0, 7, 12) endtime = starttime + 1*easy + 3*tempo + 1*easy # now convert seconds back into h, m, s # (endtime % 3600) is the number seconds # remaining after removing seconds for hours h = endtime // 3600 m = (endtime % 3600) // 60 s = (endtime % 3600) % 60 print(h, 'hours') print(m, 'minutes') print(s, 'seconds')
#2 palindrome, while loop • If a list is not a palindrome, there will be a pair of elements that is not identical 1 2 3 3 2 1 1 2 3 4 3 2 1
Use positive and negative list indices • Let the current iteration be i. • Index from beginning: L[i] • Index from end: L[-1–i] 1 2 3 3 2 1 -3 0 1 2 -2 -1
What is the range for i ? • Starts at 0 • Ends at len(L)//2, exclusive 1 2 3 4 3 2 1 L[0] L[-1-0] L[1] L[-1-1] L[2] L[-1-2]
#2 # iteratively compare elements on the left and right, # starting from the outside and going in # # len(L)/2 tells us how many elements on either side # of the list should be compared, and works # regardless of whether length of list is even or odd def is_palindrome(L): i = 0 while i < len(L)//2: if L[i]!=L[-1-i]: # positive index for left side return False # negative index for right side i += 1 return True # if reach here, never returned False
Testing code # test code # palindrome, odd length print is_palindrome([1,2,3,2,1]) # palindrome, even length print is_palindrome([1,2,3,3,2,1]) # empty list print is_palindrome([]) # not a palindrome, odd length print is_palindrome([1,2,3]) # not a palindrome, even length print is_palindrome([1,2,3,4])
#2 palindrome, without while loop • Take the input list, reverse it, and test for equality 1 2 3 4 5 6 6 5 4 3 2 1
#2 palindrome, without while loop # doesn’t use a while loop # # compare list to itself in reverse def is_palindrome2(L): return L==L[::-1] def is_palindrome3(L): L2 = L[:] # make a copy (don’t want L2.reverse() # to modify input list) return L==L2
This doesn’t work def is_palindrome3(L): L2 = L #[:] # make a copy (don’t want L2.reverse() # to modify input list) return L==L2
#3 Birthday problem • Write a function that generates a list of 28 random numbers between 1 and 365, inclusive. To generate a random number, use randint function in module random. >>> import random >>> help(random.randint) Help on method randint in module random: randint(self, a, b) method of random.Random instance Return random integer in range [a, b], including both end points.
import random def make_class(num_students): students = [] i = 0 while i < num_students: students.append(random.randint(1,365)) i += 1 #same, with a for loop #for i in range(num_students): # students.append(random.randint(1,365)) return students
Write a function that determines whether or not a list contains at least one repeated value, returning either True or False. There are multiple ways to do this; I’ll show you 3 solutions that involve loops
# loop over all pairs of elements, # see if they are the same # if loops terminate, there are no repeats def has_repeat1(L): i = 0 while i < len(L)-1: j = i + 1 while j < len(L): if L[i]==L[j]: return True j += 1 i += 1 return False
# loop over all pairs of elements, # see if they are the same # if loops terminate, there are no repeats def has_repeat2(L): for i in range(0, len(L)-1): for j in range(i+1, len(L): if L[i]==L[j]: return True return False
Some people sorted the list first • Sort a list, then compare adjacent positions to find a repeat • Solution also requires the extra operations of first sorting a list def has_repeat3(L): # don't do L.sort() because don't want to modify # list being passed in to the function L2 = sorted(L) i = 0 while i < len(L2)-1: if L2[i]==L2[i+1]: return True return False
Create 1,000 random classes of students. Calculate the probability that at least two students in the class have the same birthday. What is your result? num_classes = 1000 num_students = 28 num_repeats = 0 for i in range(num_classes): students = make_class(num_students) if has_repeat1(students): num_repeats += 1 print(num_repeats / num_classes) # answer: about 65.6
Entire program import random def make_class(num_students): students = [] for i in range(num_students): students.append(random.randint(1,365)) return students # loop over all pairs of elements, # see if they are the same # # if loops terminate, there are no repeats def has_repeat1(L): i = 0 while i < len(L)-1: j = i + 1 while j < len(L): if L[i]==L[j]: return True j += 1 i += 1 return False num_classes = 100000 num_students = 28 num_repeats = 0 for i in range(num_classes): students = make_class(num_students) if has_repeat1(students): num_repeats += 1 print(num_repeats / num_classes)
#4 prime numbers 1. Create a list of integers from 2 to N: [2, 3, 4, ..., N]. 2. Let p equal 2, the first prime number. 3. All multiples of p less than or equal to N are not prime numbers (2*p, 3*p, etc.). 4. The first number in the list that was not marked as prime in the previous step is a prime number. Replace p with this number. 5. Repeat steps 3 and 4 until p2 is greater than N. 6. All the remaining numbers in the list are prime.
For example, for N = 15: • Initial list for N=15, first prime number is p = 2 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] • Mark 4, 6, 8, 10, 12, and 14, which are multiples of p = 2 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] • Mark 6, 9, 12, and 15, which are multiples of p = 3 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] • Mark 10 and 15, which are multiples of p = 5. Stop since p2 = 25 is greater than N = 15. Primes less than or equal to 15 are 2, 3, 5, 7, 11, and 13. [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
I’ll show you 3 different solutions,in order of increasing efficiency • How do we represent this in Python? [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] • First attempt: as a shortened list of numbers • Above is represented as [2, 3, 5, 7, 11, 13] • Remove a number from list if it is not a prime
Solution #1 def primes1(N): # all candidate primes candidates = list(range(2,N+1)) p_idx = 0 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: if i*p in candidates: # remove if candidates.remove(i*p) # not prime i += 1 # find next prime p_idx += 1 # next number in list is a prime p = candidates[p_idx] return candidates # remaining nums are prime numbers
Running time of solution #1 • Three nested loops: O(N3) while p**2 < N: while i*p <= N: if i*p in candidates: • The in operator is an implicit third nested loop, since it performs linear search to find a number • Could be faster if use binary search, since the list of candidate primes is in increasing order • Recall that binary search returns the index of a value in a list, or -1 if value is not in list
Solution #2 def primes2(N): candidates = list(range(2,N+1)) p_idx = 0 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: remove_idx = binary_search(candidates, i*p) if remove_idx != -1: del candidates[remove_idx] i += 1 # find next prime p_idx += 1 p = candidates[p_idx] return candidates # remaining nums are prime numbers
Running time of solution #2 while p**2 < N: while i*p <= N: remove_idx = binary_search(candidates, i*p) if remove_idx!=None: del candidates[remove_idx] • Uses binary search instead of in operator • Binary search is O(log N) • Binary search is nested within 2 loops, so running time of algorithm is O(N2 log N)
Idea behind algorithm #3 • Try alternative representation of data [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] • Represent as: [None,None,2,3,None,5,None,7,None,None,None,11,None,13,None,None] • Everything that isn’t None is a prime • Benefit: directly access number to be marked as non-prime • L[i] == i example: index of 12 is 12 • Don’t need to search the list to find the index of a value! • To make this work, add positions for 0 and 1 at beginning of list • The following algorithm is O(N2) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14
Solution #3 def primes3(N): candidates = list(range(N+1)) candidates[0:2] = [None, None] # 0 and 1 are not primes p_idx = 2 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: candidates[i*p] = None i += 1 while True: # find next prime, skip over Nones p_idx += 1 if candidates[p_idx] != None: p = candidates[p_idx] break prime_numbers = [] # everything that isn't None is a prime for c in candidates: if c!=None: prime_numbers.append(c) return prime_numbers
Solution #3 • To find the next prime, skip over Nones • Example: suppose p_idxis 3, advance to 5 [None, None, 2, 3, None, 5, ...] while True: p_idx += 1 if candidates[p_idx] != None: p = candidates[p_idx] # next prime break
Solution #3 • When the outer loop has terminated, everything that isn't None is a prime • Recover list of prime numbers prime_numbers = [] for c in candidates: if c!=None: prime_numbers.append(c)
Empirical comparison of running times (I’ve used semicolons to separate short statements) import time start = time.clock(); primes1(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) start = time.clock(); primes2(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) start = time.clock(); primes3(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) # output: # 4.32617 seconds algorithm 1: O(N^3) # 0.28724 seconds algorithm 2: O(N^2 * log N) # 0.01832 seconds algorithm 3: O(N^2)
All on one slide def primes1(N): # all candidate primes candidates = list(range(2,N+1)) p_idx = 0 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: if i*p in candidates: # remove if candidates.remove(i*p) # not prime i += 1 # find next prime p_idx += 1 # next number in list is a prime p = candidates[p_idx] return candidates # remaining nums are prime numbers def primes2(N): candidates = list(range(2,N+1)) p_idx = 0 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: remove_idx = binary_search(candidates, i*p) if remove_idx != -1: del candidates[remove_idx] i += 1 # find next prime p_idx += 1 p = candidates[p_idx] return candidates # remaining nums are prime numbers def binary_search(L, val): lo = 0 # initialize lo and hi hi = len(L) - 1 while lo <= hi: # stopping condition mid = (lo + hi) // 2 # middle index guess = L[mid] if guess==val: # compare guess to return mid # value searched for elif guess < val: lo = mid + 1 elif guess > val: hi = mid - 1 return -1 # value not in list def primes3(N): candidates = list(range(N+1)) candidates[0:2] = [None, None] # 0 and 1 are not primes p_idx = 2 p = candidates[p_idx] while p**2 < N: i = 2 while i*p <= N: candidates[i*p] = None i += 1 while True: # find next prime, skip over Nones p_idx += 1 if candidates[p_idx] != None: p = candidates[p_idx] break prime_numbers = [] # everything that isn't None is a prime for c in candidates: if c!=None: prime_numbers.append(c) return prime_numbers import time start = time.clock(); primes1(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) start = time.clock(); primes2(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) start = time.clock(); primes3(10000); end = time.clock() print('{0:.5f} seconds'.format(end-start)) # output: # 4.32617 seconds algorithm 1: O(N^3) # 0.28724 seconds algorithm 2: O(N^2 * log N) # 0.01832 seconds algorithm 3: O(N^2)