120 likes | 219 Views
Lecture 6. Pairwise Local Alignment and Database Search. Csc 487/687 Computing for bioinformatics. Homology Search. Given sequence q does there exist a sequence d in a database D such that q and d are homolgous?
E N D
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics
Homology Search • Given sequence q does there exist a sequence d in a database D such that q and d are homolgous? • Could perform global pairwise alignment between q and each sequence in D, but • Maybe only a segment of q is highly (beyond random) similar to a segment of a database sequence • Remote homology – only motif conserved • Sequence/domain rearrangements – sequences not globally homologous, but share domain • Local alignment (alignment of segment of q with segment of d) desirable
Homology Search – Task • Present all sequences in D that have segments homologous to segments in q • Avoid presenting sequences in D that are not homologous • For each local alignment – calculate statistical probability that alignment is ”random” (not caused by evolutionary relation)
Definitions • Segment – contiguous subsequence (substring) of q or d • Segment pair – pair of segments, one from q and one from d (need not be of the same length) • Local alignment – alignment of a segment pair
Dot Plot – Visualising Similarity • For sequence q (length m), d (length n), construct m times n matrix • Make a dot in cell (i,j) if qi=dj. • Possible to filter matrix • E.g., use window of length K – make dot in (i,j) only if at least C% of characters are similar between K-windows around (i,j)
Dot Plots are Easy to Interpret • Can identify for instance repeats • Example: • Human HPRT gene (genomic sequence) • Dot if 8 identical bases • http://www.ansorge-group.embl.de/ geneskipper/dotplot.htm
Dynamic Programming for Local Alignment (Smith & Waterman 1981) • Assumptions • scoring matrix has ”negative expectation” • gaps should decrease alignment score (as before) • Consequence: • Subalignment with negative score coming first (prefix) or last (suffix) can be removed to improve alignment score • Gaps should not be included unless the alignments on either side score to make up for the gap penalty Alignment prefix suffix
Empty alignment Recurrence relation q1..i-1 h1..j qi - q1..i-1 h1..j-1 qi dj q1..i h1..j-1 - dj Effectively allows for removal of negatively contributing prefixes.
Initialization – Removing Initial Gaps • Initial gaps – in either sequence – should be ignored
The Best Local Alignment • Should ignore negatively contributing suffixes of alignments • Score of best local alignment – highest value in dynamic programming matrix • Alignment found by tracing back from maximum value until cell with value 0 (zero) has been reached
0 Best alignment Score of best alignment Calculating Best Local Alignment Use to fill rest row by row Use to fill first row Use to fill first column H matrix
Time Complexity • Sequences of lengths n and m • Two sequences of length l