140 likes | 290 Views
Dynamic Programming 6.5-6.9. Brandon Andrews. Topics. Longest Common Subsequences Global Sequence Alignment Scoring Alignments Local Sequence Alignment Alignment with Gap Penalties Questions. Longest Common Subsequences (LCS). Goal: Looking for sequence similarity between two sequences
E N D
Dynamic Programming6.5-6.9 Brandon Andrews
Topics Longest Common Subsequences Global Sequence Alignment Scoring Alignments Local Sequence Alignment Alignment with Gap Penalties Questions
Longest Common Subsequences (LCS) • Goal: Looking for sequence similarity between two sequences • Sequences can vary in length between each other • Sequences are denoted as v and w and are viewed as strings of characters. v = ATTGCTA
Subsequences • Subsequences are an ordered sequence of characters in v or w • For example: v = ATTGCTA then AGCA and ATTA are subsequences • AGCA: ATTGCTA • ATTA: ATTGCTA
Operations • The only operations we can perform is insertion and deletion • Insertion: ATCTGAT -> A-TCTGAT • The hyphen represents inserting anything • Deletion: Insertion into the other sequence to offset the characters to line up the longest common subsequences • v=AT-C-TGAT • w=-TGCAT-A- • How do we find TCTA using dynamic programming?
Review: Edit Distance • Turning one sequence into another with the least number of operations. • Allowed insertion, deletion, and substitutions • The longest common subsequences problem is basically identical with only insertion and deletion and the weights are 0 for a non-match and 1 for a match in the grid (basically Manhattan with fixed weights)
Example • Example: Other slides • Chapter 6: Edit Distance, Slides 54-58,
Global Sequence Alignment Chapter 6: Alignment
Scoring Alignments • Scoring matrices are based on biological evidence. • Certain amino acid mutations are more common than others. • For instance, Asn, Asp, Glu, and Ser are the most mutable amino acids • The probability that Ser mutates into Phe is approximately three times as likely as Trp mutating into the same amino acid Phe
PAM • 1 mutation for every 100 amino acids • Required condition that ensures proteins that are being analyzed are closely related. • The scoring matrix uses probabilities that can change if the proteins are not closely related. • The probability that one amino acid can mutate into another is different essentially • 1 PAM is the average time for the “average” protein to mutate 1% • You end up with PAM 1, PAM 2 type scoring matrices
Local Sequence Alignment • Global alignment looked at two entire strings • Local alignment attempts to only look for local alignments • That is look for small sequences that are similar in larger sequences
Smith-Waterman Local Alignment Algorithm Set an edge weight of 0 from the source to every other vertex.
Alignment with Gap Penalties • Gaps are expected in the sequences. • However, very small gaps could indicate dissimilarity, so a penalty is given for gaps that meet a criteria
References An Introduction to Bioinformatics Algorithms Related Slides