220 likes | 350 Views
Pairwise alignment. Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course, Bioinformatics unit, Tel Aviv University. and.. Benny shomer, Bar-Ilan university. Definition.
E N D
Pairwise alignment Lesson 6 Based on presentation by Irit Gat-Viks, which is based on presentation by Amir Mitchel, Introduction to bioinformatics course, Bioinformatics unit, Tel Aviv University. and.. Benny shomer, Bar-Ilan university
Definition Alignment: Comparing two (pairwise) or more (multiple) sequences. Searching for a series of identical characters in the sequences. VLSPADKTNVKAAWAKVGAHAAGHG ||| | | |||| | |||| VLSEAEWQLVLHVWAKVEADVAGHG
Sequence comparisons Goal: Comparing two specific sequences Goal: similarity search on sequence database Single pairwise comparisons Multiple pairwise comparisons We wish to optimize for accuracy, not speed We wish to optimize for speed, not accuracy Dynamic programming methods (Smith-Waterman, Needleman-Wunsch) BLAST, FASTA programs Identify homologous, common domains, common active sites etc. Next goal: refine database search, are the reported matches really interesting?
How similar are two sequences? • The common measure of sequence similarity is their alignment score • Simpler measures, e.g., % identity are also common • These require algorithm that compute the optimal alignment between sequences
Comparison methods • Global alignment – Finds the best alignment across the whole two sequences. • Local alignment – Finds regions of similarity in parts of the sequences.GlobalLocal _____ _______ __ ____ __ ____ ____ __ ____
Pairwise Alignment - Scoring • The final score of the alignment is the sum of the positive scores and penalty scores: + Number of Identities + Number if Similarities - Number of gap insertions - Number of Gap extensions Alignment score
Intuition of Dynamic Programming If we already have the optimal solution to: XY AB then we know the next pair of characters will either be: XYZ or XY- or XYZ ABC ABC AB- (where “-” indicates a gap). So we can extend the match by determining which of these has the highest score.
V(i,j) := optimal score of the alignment of S’=s1…si and T’=t1…tj (0 i n, 0 j m) V(k,l) has the following properties: • Base conditions: • V(i,0) = k=0..i(sk,-) • V(0,j) = k=0..j(-,tk) • Recurrence relation: V(i-1,j-1) + (si,tj) 1in, 1jm: V(i,j) = max V(i-1,j) + (si,-) V(i,j-1) + (-,tj) Alignment with 0 elements spacing S’=s1...si-1 with T’=t1...tj-1 si with tj. S’=s1...si with T’=t1...tj-1 and ‘-’ with tj.
Optimal Alignment - Tabular Computation • Add back pointer(s) from cell (i,j) to father cell(s) realizing V(i,j). • Trace back the pointers from (m,n) to (0,0) • Needleman-Wunsch, ‘70 Backtracking the alignment
PAM vs. BLUSOM • Choosing n • Different BLOSUM matrices are derived from blocks with different identity percentage. (e.g., blosum62 is derived from an alignment of sequences that share at least 62% identity.) Larger n smaller evolutionary distance. • Single PAM was constructed from at least 85% identity dataset. Different PAM matrices were computationally derived from it. Larger n larger evolutionary distance • Blosum uses more sequences 62 120 250
Mismatch transversion Mismatch transition Match DNA scoring matrices • Non-uniform substitutions in all nucleotides:
Topics to be Covered • Introduction • Comparison methods – Global, local alignment • Alignment parameters • Alignment scoring matrices – proteins • Alignment scoring matrices – DNA • Evaluation • Comparison programs • Choosing between Global / local alignment
Example: Global or local? • Two human transcription factors: • SP1 factor, binds to GC rich areas. • EGR-1 factor, active at differentiation stage (Fasta fromats from http://us.expasy.org/sprot/)
>sp|P08047|SP1_HUMAN Transcription factor Sp1 - Homo sapiens (Human). MSDQDHSMDEMTAVVKIEKGVGGNNGGNGNGGGAFSQARSSSTGSSSSTGGGGQESQPSP LALLAATCSRIESPNENSNNSQGPSQSGGTGELDLTATQLSQGANGWQIISSSSGATPTS KEQSGSSTNGSNGSESSKNRTVSGGQYVVAAAPNLQNQQVLTGLPGVMPNIQYQVIPQFQ TVDGQQLQFAATGAQVQQDGSGQIQIIPGANQQIITNRGSGGNIIAAMPNLLQQAVPLQG LANNVLSGQTQYVTNVPVALNGNITLLPVNSVSAATLTPSSQAVTISSSGSQESGSQPVT SGTTISSASLVSSQASSSSFFTNANSYSTTTTTSNMGIMNFTTSGSSGTNSQGQTPQRVS GLQGSDALNIQQNQTSGGSLQAGQQKEGEQNQQTQQQQILIQPQLVQGGQALQALQAAPL SGQTFTTQAISQETLQNLQLQAVPNSGPIIIRTPTVGPNGQVSWQTLQLQNLQVQNPQAQ TITLAPMQGVSLGQTSSSNTTLTPIASAASIPAGTVTVNAAQLSSMPGLQTINLSALGTS GIQVHPIQGLPLAIANAPGDHGAQLGLHGAGGDGIHDDTAGGEEGENSPDAQPQAGRRTR REACTCPYCKDSEGRGSGDPGKKKQHICHIQGCGKVYGKTSHLRAHLRWHTGERPFMCTW SYCGKRFTRSDELQRHKRTHTGEKKFACPECPKRFMRSDHLSKHIKTHQNKKGGPGVALS VGTLPLDSGAGSEGSGTATPSALITTNMVAMEAICPEGIARLANSGINVMQVADLQSINI SGNGF >sp|P18146|EGR1_HUMAN Early growth response protein 1 (EGR-1) (Krox-24 protein) (ZIF268) (Nerve growth factor-induced protein A) (NGFI-A) (Transcription factor ETR103) (Zinc finger protein 225) (AT225) - Homo sapiens (Human). MAAAKAEMQLMSPLQISDPFGSFPHSPTMDNYPKLEEMMLLSNGAPQFLGAAGAPEGSGS NSSSSSSGGGGGGGGGSNSSSSSSTFNPQADTGEQPYEHLTAESFPDISLNNEKVLVETS YPSQTTRLPPITYTGRFSLEPAPNSGNTLWPEPLFSLVSGLVSMTNPPASSSSAPSPAAS SASASQSPPLSCAVPSNDSSPIYSAAPTFPTPNTDIFPEPQSQAFPGSAGTALQYPPPAY PAAKGGFQVPMIPDYLFPQQQGDLGLGTPDQKPFQGLESRTQQPSLTPLSTIKAFATQSG SQDLKALNTSYQSQLIKPSRMRKYPNRPSKTPPHERPYACPVESCDRRFSRSDELTRHIR IHTGQKPFQCRICMRNFSRSDHLTTHIRTHTGEKPFACDICGRKFARSDERKRHTKIHLR QKDKKADKSVVASSATSSLSSYPSPVATSYPSPVTTSYPSPATTSYPSPVPTSFSSPGSS TYPSPVHSGFPSPSVATTYSSVPPAFPAQVSSFPSSAVTNSFSASTGLSDMTATFSPRTI EIC
Available softwares… • http://en.wikipedia.org/wiki/Sequence_alignment_software • http://fasta.bioch.virginia.edu/fasta_www/home.html • LAlign (local alignment), PLalign(dot plot) • PRSS/ PRFX (significance by Monte Carlo) • http://bioportal.weizmann.ac.il/toolbox/overview.html (Many useful software), Needle, Water. • Bl2seq (NCBI)
Using LAlign • http://www.ch.embnet.org/software/LALIGN_form.html • http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NP_006758.2 • http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NP_066300.1
Bl2Seq at NCBIhttp://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi
Conclusions • The proteins share only a limited area of sequence similarity. Therefore, the use of local alignment is recommended. • We found a local alignment that pointed to a possible structural similarity, which points to a possible function similarity. • Reasons to make Global alignment: • Checking minor differences between close homologous. • Analyzing polymorphism. • A good reason