150 likes | 511 Views
Longest Common Subsequence (LCS). Dr. Nancy Warter-Perez. Functions. Function definition def adder(a, b, c): return a+b+c Function calls adder(1, 2, 3) -> 6. Functions – Polymorphism . >>>def fn2(c): … a = c * 3 … return a >>> print fn2(5) 15 >>> print fn2(1.5) 4.5
E N D
Longest Common Subsequence (LCS) Dr. Nancy Warter-Perez
Functions • Function definition • def adder(a, b, c): return a+b+c • Function calls • adder(1, 2, 3) -> 6 Introduction to Python – Part III
Functions – Polymorphism >>>def fn2(c): … a = c * 3 … return a >>> print fn2(5) 15 >>> print fn2(1.5) 4.5 >>> print fn2([1,2,3]) [1,2,3,1,2,3,1,2,3] >>> print fn2("Hi") HiHiHi Introduction to Python – Part III
Functions - Recursion def fn_Rec(x): if x == []: return fn_Rec(x[1:]) print x[0], y = [1,2,3,4] fn_Rec(y) >>> 4 3 2 1 Introduction to Python – Part III
Scopes • Scopes divine the “visibility” of a variable • Variables defined outside of a function are visible to all of the functions within a module (file) • Variables defined within a function are local to that function • To make a variable that is defined within a function global, use the global keyword Ex 2: x = 5 def fnc(): global x x = 2 print x, fnc() print x >>> 2 2 Ex 1: x = 5 def fnc(): x = 2 print x, fnc() print x >>> 2 5 Developing Pairwise Sequence Alignment Algorithms
Modules • Why use? • Code reuse • System namespace partitioning (avoid name clashes) • Implementing shared services or data • How to structure a Program • One top-level file • Main control flow of program • Zero or more supplemental files known as modules • Libraries of tools Developing Pairwise Sequence Alignment Algorithms
Modules - Import • Import – used to gain access to tools in modules Ex: contents of file b.py def spam(text): print text, 'spam' contents of file a.py import b b.spam('gumby') Developing Pairwise Sequence Alignment Algorithms
Dynamic Programming • Applied to optimization problems • Useful when • Problem can be recursively divided into sub-problems • Sub-problems are not independent • Examples • Needleman-Wunch Global Alignment Algorithm • Smith-Waterman Local Alignment Algorithm Developing Pairwise Sequence Alignment Algorithms
Longest Common Subsequence (LCS) Problem • Can have insertion and deletions but no substitutions (no mismatches) • Ex: V: ATCTGAT W: TGCATA LCS: TCTA Developing Pairwise Sequence Alignment Algorithms
LCS Problem (cont.) • Similarity score si-1,j si,j = max { si,j-1 si-1,j-1 + 1, if vi = wj • On board example • V: ATCTGAT • W:TGCATA Developing Pairwise Sequence Alignment Algorithms
Indels – insertions and deletions (e.g., gaps) • alignment of V and W • V = rows of similarity matrix (vertical axis) • W = columns of similarity matrix (horizontal axis) • Space (gap) in W (UP) • insertion • Space (gap) in V (LEFT) • deletion • Match (no mismatch in LCS) (DIAG) Developing Pairwise Sequence Alignment Algorithms
LCS(V,W) Algorithm for i = 0 to n si,0 = 0 for j = 1 to m s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT Developing Pairwise Sequence Alignment Algorithms
Print-LCS(b,V,i,j) if i == 0 or j == 0 return if bi,j = DIAG PRINT-LCS(b, V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(b, V, i-1, j) else PRINT-LCS(b, V, i, j-1) Note: b and V could be global variables Developing Pairwise Sequence Alignment Algorithms
Programming Workshop and Homework – Implement LCS • Workshop/Homework – Write a Python script to implement LCS (V, W). Prompt the user for 2 sequences (V and W) and display b and s • Homework (due Thursday, November 15th with Automatic Extension to Tuesday, November 27th) – Add the Print-LCS(V, i, j) function to your Python script. The script should prompt the user for 2 sequences and print the longest common sequence. Developing Pairwise Sequence Alignment Algorithms
Classic Papers • Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970. (http://www.cs.umd.edu/class/spring2003/cmsc838t/papers/needlemanandwunsch1970.pdf) • Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://www.cmb.usc.edu/papers/msw_papers/msw-042.pdf) Developing Pairwise Sequence Alignment Algorithms