230 likes | 887 Views
Sequence Alignment I Dot Matrices. Reading. Mount, Chapters 1, 2, and 3 (up to page 94). Why compare sequences? . To find whether two (or more) genes or proteins are evolutionarily related to each other To find structurally or functionally similar regions within proteins.
E N D
Reading • Mount, Chapters 1, 2, and 3 (up to page 94)
Why compare sequences? • To find whether two (or more) genes or proteins are evolutionarily related to each other • To find structurally or functionally similar regions within proteins
Similar genes arise by gene duplication • Copy of a gene inserted next to the original • Two copies mutate independently • Each can take on separate functions • All or part can be transferred from one part of genome to another
Sequence Comparison Methods • Dot matrix analysis • Dynamic Programming • Word or k-tuple methods (FASTA and BLAST)
Dot matrices c g g a c a c a c g
Interpretation • Regions of similarity appear as diagonal runs of dots • Reverse diagonals (perpendicular to diagonal) indicate inversions • Reverse diagonals crossing diagonals (Xs) indicate palindromes
Interpretation • Can link separate diagonals to form alignment with gaps • Each a.a. or base can only be used once • Can't double back • A gap is introduced by each vertical or horizontal skip
Filtering • Dot matrices for long sequences can be noisy due to insignificant matches • Solution: use a window and a threshold • compare character by character within a window (have to choose window size) • require certain fraction of matches within window in order to display it with a dot
Dot plot comparison using windows Window size = 11 Stringency = 7 (Put a dot only if 7 out of next 11 positions are identical.)
Uses for dot matrices • Aligning two proteins or two nucleic acid sequences • Finding amino acid repeats within a protein by comparing a protein sequence to itself • Repeats appear as a set of diagonal runs stacked vertically and/or horizontally
Repeats Human LDL receptor protein sequence (Genbank P01130) W = 1 S = 1 (Mount, Fig. 3.6)
Repeats W = 23 S = 7 (Mount, Fig. 3.6)
Using substitution matrices • Dots can have weights • Some matches are rewarded more than others, depending on likelihood • Use PAM or BLOSUM matrix (more on these later) • Put a dot only if a minimum total or average weight is achieved • See Mount, Fig. 3.5