1 / 16

Multiple Sequence Alignment : NP - Hardness and How to Deal with It

Multiple Sequence Alignment : NP - Hardness and How to Deal with It. Jens Stoye Bielefeld University, Germany. Preliminaries : Pairwise Alignment. >pdb|1KSW|A Chain A, Structure Of Human C- Src Tyrosine Kinase (Thr338gly Mutant ) In Complex With N6-Benzyl Adp Length =452

macha
Download Presentation

Multiple Sequence Alignment : NP - Hardness and How to Deal with It

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple SequenceAlignment:NP-HardnessandHowtoDealwithIt Jens Stoye Bielefeld University, Germany

  2. Preliminaries: PairwiseAlignment >pdb|1KSW|A Chain A, StructureOf Human C-SrcTyrosineKinase (Thr338gly Mutant) In ComplexWith N6-Benzyl Adp Length=452 Score = 161 bits (408), Expect = 5e-47, Method: Compositionalmatrixadjust. Identities = 81/85 (95%), Positives = 81/85 (95%), Gaps = 1/85 (1%) Query 1 PRESLRLEAKLGQGCFGEVWMGTWNDTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKL 60 PRESLRLE KLGQGCFGEVWMGTWN TTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKL Sbjct 182 PRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKL 241 Query 61 VQLYAVVS-EPIYIVIEYMSKGSLL 84 VQLYAVVS EPIYIV EYMSKGSLL Sbjct242 VQLYAVVSEEPIYIVGEYMSKGSLL 266 PRESLRLEAKLGQGCFGEVWMGTWNDTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKL PRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKL VQLYAVVS-EPIYIVIEYMSKGSLL VQLYAVVSEEPIYIVGEYMSKGSLL

  3. Preliminaries: PairwiseAlignment Find bestalignmentoftwosequences:highest score/lowestcost Analysis: O(n2) time

  4. Multiple Alignment ksequences, not just 2 sp|P00526|SRC ---GLAK--DAWEIPRESLRLEAKLGQGCFGEVWMGTWND-TTRVAIKTLKPGT--MSPE 52 sp|P00527|YES ---GLAK--DAWEIPRESLRLEVKLGQGCFGEVWMGTWNG-TTKVAIKTLKLGT--MMPE 52 sp|P00521|ABL TIYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDT--MEVE 58 sp|P00542|FES -VLNRAVPKDKWVLNHEDLVLGEQIGRGNFGEVFSGRLRADNTLVAVKSCRETLPPDIKA 59 sp|P00530|FPS -VLTRAVLKDKWVLNHEDVLLGERIGRGNFGEVFSGRLRADNTPVAVKSCRETLPPELKA 59 sp|P00532|KRAF -------SSYYWKMEASEVMLSTRIGSGSFGTVYKGKWHGDVAVKILKVVDPTP--EQLQ 51 * : .: : ::* * :* *: * . :* sp|P00526|SRC AFLQEAQVMKKLRHEKLVQLYAVVSEEP-IYIVIEYMSKGSLLDFLKGEMGKYLRLPQLV 111 sp|P00527|YES AFLQEAQIMKKLRHDKLVPLYAVVSEEP-IYIVTEFMTKGSLLDFLKEGEGKFLKLPQLV 111 sp|P00521|ABL EFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVSAVVLL 118 sp|P00542|FES KFLQEAKILKQYSHPNIVRLIGVCTQKQPIYIVMELVQGGDFLTFLRT-EGARLRMKTLL 118 sp|P00530|FPS KFLQEARILKQCNHPNIVRLIGVCTQKQPIYIVMELVQGGDFLSFLRS-KGPRLKMKKLI 118 sp|P00532|KRAF AFRNEVAVLRKTRHVNILLFMGYMTKDN-LAIVTQWCEGSSLYKHLHV-QETKFQMFQLI 109 * :*. :::: * ::: : . :.. : *: : ..: .*: . *: sp|P00526|SRC DMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAK- 170 sp|P00527|YES DMAAQIADGMAYIERMNYIHRDLRAANILVGDNLVCKIADFGLARLIEDNEYTARQGAK- 170 sp|P00521|ABL YMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAK- 177 sp|P00542|FES QMVGDAAAGMEYLESKCCIHRDLAARNCLVTEKNVLKISDFGMSREAADGIYAASGGLRQ 178 sp|P00530|FPS KMMENAAAGMEYLESKHCIHRDLAARNCLVTEKNTLKISDFGMSRQEEDGVYASTGGMKQ 178 sp|P00532|KRAF DIARQTAQGMDYLHAKNIIHRDMKSNNIFLHEGLTVKIGDFGLATVKSRWSGSQQVEQPT 169 : : : .* *:. :***: : * :: : *:.***:: : sp|P00526|SRC FPIKWTAPEAALYG---RFTIKSDVWSFGILLTELTTKGRVPYPGMVNR-EVLDQVERGY 226 sp|P00527|YES FPIKWTAPEAALYG---RFTIKSDVWSFGILLTELVTKGRVPYPGMVNR-EVLEQVERGY 226 sp|P00521|ABL FPIKWTAPESLAYN---KFSIKSDVWAFGVLLWEIATYGMSPYPGIDLS-QVYELLEKDY 233 sp|P00542|FES VPVKWTAPEALNYG---RYSSESDVWSFGILLWETFSLGASPYPNLSNQ-QTREFVEKGG 234 sp|P00530|FPS IPVKWTAPEALNYG---WYSSESDVWSFGILLWEAFSLGAVPYANLSNQ-QTREAIEQGV 234 sp|P00532|KRAF GSVLWMAPEVIRMQDDNPFSFQSDVYSYGIVLYELMAG-ELPYAHINNRDQIIFMVGRGY 228 .: * *** :: :***:::*::* * : **. : : : :. sp|P00526|SRC RMPCP----PECPESLHDLMCQCWRKDPEERPTFKYLQAQLLPACVLEVAE- 273 sp|P00527|YES RMPCP----QGCPESLHELMKLCWKKDPDERPTFEYIQSFLEDYFTAAEPSG 274 sp|P00521|ABL RMERP----EGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSIS- 280 sp|P00542|FES RLPCP----ELCPDAVFRLMEQCWAYEPGQRPSFSAIYQELQSIRKRHR--- 279 sp|P00530|FPS RLEPP----EQCPEDVYRLMQRCWEYDPHRRPSFGAVHQDLIAIRKRHR--- 279 sp|P00532|KRAF ASPDLSRLYKNCPKAIKRLVADCVKKVKEERPLFPQILSSIELLQHSLPKIN 280 **. : *: * ** * : :

  5. Multiple Alignment ksequences, not just 2

  6. Multiple Alignment – Why? Highlight similaritiesofthesequences in a family: • sequenceassembly • molecularmodeling, structure-functionconclusions • databasesearch (sequencefamilies) • proteindomains • primer design Highlight dissimilaritiesbetweenthesequences in a family: • reconstructionofphylogenetictrees • analysisofsinglenucleotidepolymorphisms (SNPs) „Oneortwohomologoussequenceswhisper... a full multiple alignmentshouts out loud“ (Hubbard et al., 1996)

  7. Multiple AlignmentObjectiveFunctions • Find bestalignmentofksequences:highest score/lowestcost • Based on pairwiseprojections: • sumof all pairs: • treealignment score:

  8. Alignmentof 2 Sequences 2 sequences O(n2) time

  9. Alignmentof 3 Sequences 3 sequences O(n3) time

  10. AlignmentofkSequences in factevenworse: O(nk2k) time ksequences O(nk) time

  11. NP Hardness CS terminology: The computationalproblemof SP multiple sequencealignmentis NP hard. In practice: Don‘teventryitformorethan 10 or 12 sequences. Whatcanwe do? • computeanyway • running time heuristics • approximationalgorithms • fixedparameteralgorithms • correctnessheuristics

  12. Carrillo/LipmanHeuristics Running time heuristics: oftenfaster, but not in worstcase.

  13. Center Star Algorithm Approximation algorithm:Never worsethan 2 timestheoptimum

  14. DivideandConquerAlignment Noperformanceguarantee, but oftenverygood

  15. Multiple Alignment in Practice Mostly progressive, e.g. CLUSTAL W Not covered: hybrid approaches, e.g. T-COFFEE, MAUVE, Clustal Omega local multiple alignment, e.g. DIALIGN

  16. Thankyou---Anyfurtherquestions?

More Related