1 / 28

Programming for Engineers in Python

Programming for Engineers in Python. Recitation 12. Plan. Dynamic Programming Coin Change problem Longest Common Subsequence Application to Bioinformatics. Teaching Survey . Please answer the teaching survey: https://www.ims.tau.ac.il/Tal / This will help us to improve the course

afya
Download Presentation

Programming for Engineers in Python

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming for Engineers in Python Recitation 12

  2. Plan • Dynamic Programming • Coin Change problem • Longest Common Subsequence • Application to Bioinformatics

  3. Teaching Survey • Please answer the teaching survey: https://www.ims.tau.ac.il/Tal/ • This will help us to improve the course • Deadline: 4.2.12

  4. Coin Change Problem • What is the smallest number of coins I can use to make exact change? • Greedy solution: pick the largest coin first, until you reach the change needed • In the US currency this works well: • Give change for 30 cents if you’ve got 1, 5, 10, and 25 cent coins: • 25 + 5 → 2 coins http://jeremykun.files.wordpress.com/2012/01/coins.jpg

  5. The Sin of Greediness • What if you don’t have 5 cent coins? • You got 1, 10, and 25 • Greedy solution: 25+1+1+1+1+1 → 6 coins • But a better solution is: 10+10+10 → 3 coins! • So the greedy approach isn’t optimal The Seven Deadly Sins and the Four Last Things by Hieronymus Bosch http://en.wikipedia.org/wiki/File:Boschsevendeadlysins.jpg

  6. Recursive Solution • Reminder – find the minimal # of coins needed to give exact change with coins of specified values • Assume that we can use 1 cent coins so there is always some solution • Denote our coin list by c1, c2, …, ck(c1=1) • k is the # of coins values we can use • Denote the change required by n • In the previous example: • n=30, k=3, c1=1, c2=10, c3=25

  7. Recursive Solution • Recursion Base: • If n=0 then we need 0 coins • If k=1, c1=1, so we need n coins • Recursion Step: • If n<ck we can’t use ck→ We solve for n and c1,…,ck-1 • Otherwise, we can either use ckor not use ck • If we use ck → we solve for n-ckand c1,…,ck • If we don’t use ck→ we solve for n and c1,…,ck-1

  8. Recursion Solution defcoins_change_rec( cents_needed, coin_values): ifcents_needed <= 0: # base 1 return 0 eliflen(coin_values) == 1: # base 2 returncents_needed# assume that coin_values[0]==1 elifcoin_values[-1] > cents_needed: # step 1 returncoins_change_rec( cents_needed, coin_values[:-1]) else: # step 2 s1 = coins_change_rec( cents_needed, coin_values[:-1] ) s2 = coins_change_rec( cents_needed-coin_values[-1], coin_values) returnmin(s1, s2+1) coins_rec.py

  9. Repeated calls • We count how many times we call the recursive function for each set of arguments: calls = {} defcoins_change_rec(cents_needed, coin_values): global calls calls[(cents_needed, coin_values)] = calls.get( (cents_needed, coin_values) , 0) + 1 … >>> print'result', coins_change_rec(30, (1,5,10,25)) result 2 >>> print'max calls',max(calls.values()) max calls 4

  10. Dynamic Programing - Memoization • We want to store the values of calculation so we don’t repeat them • We create a table called mem • # of columns: # of cents needed + 1 • # of rows: # of coin values + 1 • The table is initialized with some illegal value – for example -1: mem = [ [-1 fory inrange(cents_needed+1)] for x inrange(len(coin_values)) ]

  11. Dynamic Programing - Memoization • For each call of the recursive function, we check if mem already has the answer: ifmem[len(coin_values)][cents_needed] == -1: • In case that it doesn’t (the above is True) we calculate it as before, and we store the result, for example: ifcents_needed <= 0: mem[len(coin_values)][cents_needed]= 0 • Eventually we return the value returnmem[len(coin_values)][cents_needed] coins_mem.py

  12. Dynamic Programing - Iteration • Another approach is to first build the entire matrix • This matrix holds the minimal number of coins we need to get change for j cents using the first i coins (c1, c2, …, ci) • The solution will be min_coins[k,n] – the last element in the matrix • This will save us the recursive calls, but will enforce us to calculate all the values apriori • Bottom-up approach vs. the top-down approach of memoization

  13. Dynamic Programming approach • The point of this approach is that we have a recursive formula to break apart a problem to sub problems • Then we can use different approaches to minimize the number of calculations by storing the sub solutions in memory

  14. Bottom up - example matrix • Set n=4 and k=3 (coins are 1, 2, and 3 cents) • Base cases: • how many coins do I need to make change for zero cents? Zero! • So min_coins[i,0]=0 • And how many pennies do I need to make j cents? Exactly j (we assumed we can use pennies) • So min_coins[0,j]=j • So the base cases give us: • Next – the recursion step

  15. Bottom up - example matrix • For particular choice of i,j (but not i=0 or j=0) • To determine min_coins[i,j] – the minimum # of coins to get exact change of j using the first i coins • We can use the coin ci and add +1 to min_coins[i,j-ci] (only valid if j>ci) • We can decide not to use ci, therefore to use only c0 ,..,ci-1, and therefore min_coins[i-1,j] . • So which way do we choose? • The one with the least coins! min_coins[i,j] = min(min_coins[i,j-ci] +1, min_coins[i-1,j])

  16. Example matrix – recursion step • Set n=4 and k=3 (coins are 1, 2, and 3 cents) • So the base cases give us: • M(1,1)=1 • M(1,2)=1 • M(1,3)=min(M(1,1)+1,M(0,3))=min(2,2)=2 • M(1,4)=min(M(1,2)+1, M(0,4))=min(2,4)=2 • etc… coins_matrix.py The code for the matrix solution and the idea is from http://jeremykun.wordpress.com/2012/01/12/a-spoonful-of-python/

  17. Longest Common Subsequence • Given two sequences (strings/lists) we want to find the longest common subsequence • Definition – subsequence: B is a subsequence of A if B can be derived from A by removing elements from A • Examples • [2,4,6] is a subsequence of [1,2,3,4,5,6] • [6,4,2] is NOT a subsequence of [1,2,3,4,5,6] • ‘is’ is a subsequence of ‘distance’ • ‘nice’ is NOT a subsequence of ‘distance’

  18. Longest Common Subsequence • Given two subsequences (strings or lists) we want to find the longest common subsequence: • Example for a LCS: • Sequence 1: HUMAN • Sequence 2: CHIMPANZEE • Applications include: • BioInformatics (next up) • Version Control http://wordaligned.org/articles/longest-common-subsequence

  19. The DNA • Our biological blue-print • A sequence made of four bases – A, G, C, T • Double strand: • A connects to T • G connects to C • Every triplet encodes for an amino-acid • Example: GAG→Glutamate • A chain of amino-acids is a protein – the biological machine! http://sips.inesc-id.pt/~nfvr/msc_theses/msc09b/

  20. Longest common subsequence • The DNA changes: • Mutation: A→G, C→T, etc. • Insertion: AGC→ ATGC • Deletion: AGC → A‒C • Given two non-identical sequences, we want to find the parts that are common • So we can say how different they are • Which DNA is more similar to ours? The cat’s or the dog’s? http://palscience.com/wp-content/uploads/2010/09/DNA_with_mutation.jpg

  21. Recursion • An LCS of two sequences can be built from the LCSes of prefixes of these sequences • Denote the sequences seq1 and seq2 • Base – check if either sequence is empty: Iflen(seq1) == 0 orlen(seq2) == 0: return [ ] • Step – build solution from shorter sequences: If seq1[-1] == seq2[-1]: returnlcs (seq1[:-1],seq2[:-1]) + [ seq1[-1] ] else: returnmax(lcs(seq1[:-1],seq2), lcs(seq1,seq2[:-1]), key = len) lcs_rec.py

  22. Wasteful Recursion • For the inputs “MAN” and “PIG”, the calls are: (1, ('', 'PIG'))(1, ('M', 'PIG'))(1, ('MA', 'PIG'))(1, ('MAN', ''))(1, ('MAN', 'P'))(1, ('MAN', 'PI'))(1, ('MAN', 'PIG'))(2, ('MA', 'PI'))(3, ('', 'PI'))(3, ('M', 'PI'))(3, ('MA', ''))(3, ('MA', 'P'))(6, ('', 'P'))(6, ('M', ''))(6, ('M', 'P')) • 24 redundant calls! http://wordaligned.org/articles/longest-common-subsequence

  23. Wasteful Recursion • When comparing longer sequences with a small number of letters the problem is worse • For example, DNA sequences are composed of A, G, T and C, and are long • For lcs('ACCGGTCGAGTGCGCGGAAGCCGGCCGAA', 'GTCGTTCGGAATGCCGTTGCTCTGTAAA') we get an absurd: (('', 'GT'), 13,182,769)(('A', 'GT'), 13,182,769)(('A', 'G'), 24,853,152)(('', 'G'), 24,853,152)(('A', ''), 24,853,152) http://blog.oncofertility.northwestern.edu/wp-content/uploads/2010/07/DNA-sequence.jpg

  24. DP Saves the Day • We saw the overlapping sub problems emerge – comparing the same sequences over and over again • We saw how we can find the solution from solution of sub problems – a property we called optimal substructure • Therefore we will apply a dynamic programming approach • Start with top-down approach - memoization

  25. Memoization • We save results of function calls to refrain from calculating them again deflcs_mem( seq1, seq2, mem=None ): ifnotmem: mem = { } key = (len(seq1), len(seq2)) # tuples are immutable if key notinmem: # result not saved yet iflen(seq1) == 0 or len(seq2) == 0: mem[key] = [ ] else: if seq1[-1] == seq2[-1]: mem[key]= lcs_mem(seq1[:-1], seq2[:-1], mem) + [ seq1[-1] ] else: mem[key]= max(lcs_mem(seq1[:-1], seq2 ,mem), lcs_mem(seq1, seq2[:-1], mem), key=len) returnmem[key]

  26. “maximum recursion depth exceeded” • We want to use our memoized LCS algorithm on two long DNA sequences: >>> fromrandom import choice >>> defbase(): … return choice('AGCT') >>> seq1 = str([base() for x inrange(10000)]) >>> seq2 = str([base() for x inrange(10000)]) >>>printlcs(seq1, seq2) RuntimeError: maximum recursion depth exceeded in cmp • We need a different algorithm…

  27. link→

  28. DNA Sequence Alignment • Needleman-Wunsch DP Algorithm: • Python package: http://pypi.python.org/pypi/nwalign • On-line example: http://alggen.lsi.upc.es/docencia/ember/frame-ember.html • Code: needleman_wunsch_algorithm.py • Lecture videos from TAU: • http://video.tau.ac.il/index.php?option=com_videos&view=video&id=4168&Itemid=53

More Related