1 / 64

T-COFFEE , a novel method for combining biological information

T-COFFEE , a novel method for combining biological information. Cédric Notredame. Potential Uses of A Multiple Sequence Alignment ?. chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE

avani
Download Presentation

T-COFFEE , a novel method for combining biological information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. T-COFFEE,a novel method for combining biological information Cédric Notredame

  2. Potential Uses of A Multiple Sequence Alignment? chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Extrapolation Phylogeny Multiple Alignments Are CENTRAL to MOST Bioinformatics Techniques. Motifs/Patterns Struc. Prediction Profiles

  3. BIOLOGY:What is A Good Alignment COMPUTATIONWhat is THE Good Alignment chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: .. . : . . * . *: * Why Is It Difficult To Compute A multiple Sequence Alignment? A CROSSROAD PROBLEM

  4. Why Is It Difficult To Compute A multiple Sequence Alignment ? BIOLOGY COMPUTATION CIRCULAR PROBLEM.... Good Good Alignment Sequences

  5. Dynamic Programming Using A Substitution Matrix Progressive Alignment

  6. The T-Coffee Algorithm

  7. Progressive Alignment Principle and its Limitations…

  8. The Extended Library Principle…

  9. The Extended Library Principle…

  10. The Triplet Assumption SEQ A SEQ B

  11. Weighting=Using The surrounding Information (Coffee) Extension=Using Information from Other Sequences Weighting And Extension

  12. T-Coffee Progressive Alignment Notredame, Higgins, Heringa, 2000 Dynamic Programming Using The extended Library

  13. Mixing Local and Global Alignments Global Alignment Local Alignment Extension Multiple Sequence Alignment

  14. What is a library? 2 Seq1 MySeq Seq2 MyotherSeq #1 2 1 1 25 3 8 70 …. 3 Seq1 anotherseq Seq2 atsecondone Seq3 athirdone #1 2 1 1 25 #1 3 3 8 70 …. Extension+T-Coffee Library Based Multiple Sequence Alignment

  15. Validating T-Coffee

  16. What Is BaliBase BaliBase BaliBase is a collection of reference Multiple Alignments The Structure of the Sequences are known and were used to assemble the MALN. Evaluation is carried out by Comparing the Structure Based Reference Alignment With its Sequence Based Counterpart

  17. BaliBase  Method X DALI, Sap … Comparison

  18. T-Coffee Results Validation Using BaliBase

  19. Validation Using BaliBase

  20. Choosing The Right Method (MAFFT evaluation)

  21. Choosing The Right Method (MAFFT evaluation)

  22. Taking T-Coffee Further:Using Structures

  23. Mixing Heterogenous Information With T-Coffee Local Alignment Global Alignment Multiple Alignment Specialist Structural Multiple Sequence Alignment

  24. Why Do We Want To Mix Sequences and Structures? STUCTURE  FUNCTION • Sequences are Cheap and Common. • Structures are Expensive and Rare. • We WANT to use Structural information in multiple alignments: • To help the alignment • To extrapolate from Structures to Sequences.

  25. Low gap penalties high gap penalties Helping an Alignment With Structures? Better gap penalties (ClustalW).

  26. Helping an Alignment With Structures? Better gap penalties (ClustalW). Revealing Very Distant Relationships 1tc3c 1hstA

  27. Is It Possible to Use Structural Information ? Any_pair THE new T-coffee method Seq Vs Seq LocalGlobal Seq Vs Struct Struct Vs Struct FUGUE SAP Evaluation on Homestrad

  28. Validation of Any_pair on the Homestrad Database (Orla O’Sullivan, Des Higgins and C. Notredame) Is It Possible to Use Structural Information ? CW: Clustal W TC: T-Coffee default Result: % of columns correctly aligned as judged from the Homestrad reference Alignment SA: T-Coffee Using SAP FU: T-Coffee Using SAP

  29. Of the Importance of being Trustworthy…Identifying Good Bits in an Alignment

  30. How Good Is my Alignment? cah2_human NGPEHWHK-DFPIAKGERQSPVDIDTHTAKYDP------------SLKPLSVS--YDQAT cahp_mouse --GVEWGL-VFPDANGEYQSPINLNSREARYDP------------SLLDVRLSPNYVVCR cah4_rat SGPEQWTG----DCKKNQQSPINIVTSKTKLNP------------SLTPFTFVG-YDQKK ptpg_mouse YGPEHWVT-SSVSCGGSHQSPIDILDHHARVGD------------EYQELQLDG-FDNES cah6_human LDEAHWPQ-HYPACGGQRQSPINLQRTKVRYNP------------SLKGLNMTGYETQAG cah_dunsa -VGFDWTGGVCVNTGTSKQSPINIETDSLAEESERLGTADDTSRLALKGLLSS--SYQLT cahh_varv --------------MSQQLSPINIETKKAISNA------------RLKPLNIH--YNESK cah2_chlre EGKDGAG-NPWVCKTGRKQSPINVPQYHVLDGK------------GSK--IATGLQTQWS **::: cah2_human ---------SLRILNNGHAFNVEFDD-SQDKAVLK--------------------GGPLD cahp_mouse ---------DCEVTNDGHTIQVILKS----KSVLS--------------------GGPLP cah4_rat ---------KWEVKNNQHSVEMSLGE----DIYIF--------------------GGDLP ptpg_mouse SN-------KTWMKNTGKTVAILLKD----DYFVS--------------------GAGLP cah6_human ---------EFPMVNNGHTVQIGLPS----TMRMT--------------------VAD-G cah_dunsa ---------SEVAINLEQDMQFSFNAPDEDLPQLT--------------------IGGVV cahh_varv ---------PTTIQNTGKLVRINFKG-----GYLS--------------------GGFLP cah2_chlre YPDLMSNGSSVQVINNGHTIQVQWTY----DYAGHATIAIPAMRNQSNRIVDVLEMRPND * : . . cah2_human G----TYRLIQFHFHWGSLD--GQGSEHTVDKKKYAAELHLVHWNTK-YGDFGKAVQQPD cahp_mouse Q--GQEFELYEVRFHWGREN--QRGSEHTVNFKAFPMELHLIHWNSTLFGSIDEAVGKPH cah4_rat T----QYKAIQLHLHWSEES--NKGSEHSIDGKHFAMEMHVVHKKMTTGDKVQDSDSKD- ptpg_mouse G----RFKAEKVEFHWGHSNG-SAGSEHSVNGRRFPVEMQIFFYNPDDFDSFQTAISENR cah6_human I----VYIAQQMHFHWGGASSEISGSEHTVDGIRHVIEIHIVHYNS-KYKTYDIAQDAPD cah_dunsa H----TFKPVQIHFH-------HFASEHAIDGQLYPLEAHMVMASQN-DGS--------D cahh_varv N----EYVLSSLHIYWGKED--DYGSNHLIDVYKYSGEINLVHWNKKKYSSYEEAKKHDD cah2_chlre ASDRVTAVPTQFHFH--------STSEHLLAGKIFPLELHIVHKVTD---KLEACKG--G ...:: *:* : . * ::.

  31. Measuring The Local Reliability: CORE cah2_human NGPEHWHK-DFPIAKGERQSPVDIDTHTAKYDPSLKPLSVS cahp_mouse --GVEWGL-VFPDANGEYQSPINLNSREARYDPSLLDVRLS cah4_rat SGPEQWTG----DCKKNQQSPINIVTSKTKLNPSLTPFTFV ptpg_mouse YGPEHWVT-SSVSCGGSHQSPIDILDHHARVGDEYQELQLD cah6_human LDEAHWPQ-HYPACGGQRQSPINLQRTKVRYNPSLKGLNMT Measure of Reliability S Escore (Q,x) Core (Q)= N*Max Escore

  32. Specificity () and Sensitivity () 0.48 CORE index

  33. What is the Local Quality of my Alignment I II

  34. Using Consistency For Automatic Annotation? T-COFFEE, Version_1.24(Wed Nov 15 18:31:29 PST 2000) Notredame, Higgins, Heringa, JMB(302)pp205-217,2000 CPU TIME:11 sec. SCORE=39 * BAD AVG GOOD * cah2_human : 42 cah4_rat : 41 cah6_human : 40 cahp_mouse : 43 cah_dunsa : 33 cah2_human 77664444-454555557666665554444444------------33322222- cah4_rat 54553332----233445655555554444444------------443323221 cah6_human 44333443-333344445555444444444444------------444433331 cahp_mouse --633453-333345565554444334444455------------555444331 cah_dunsa -34334320212223456555555543333333ERLGTADDTSRL22222111- cah2_chlre 7663333-0333334566666555444343322------------222--1110 ptpg_mouse 67763343-333334445444433333333333------------332222221 cahh_varv --------------5555555555554444433------------33322211- Cons 655433430333334455555554444444443------------333322221 cah2_human -11121---------22223334333322321-00011222------------- cah4_rat -22222---------23333344443344442----22222------------- cah6_human 001122---------22233344333333433----22222------------- cahp_mouse 022333---------34344455554444543----33334------------- cah_dunsa -11111---------11111111111111110P00000111------------- cah2_chlre 00000000DLMSNGS11223333333433332----22111ATIAIPAMRNQSN ptpg_mouse -1111100-------12234445444544433----33333------------- cahh_varv -11222---------22233333333333322-----1122------------- Cons 01112100-------22233334333333332-00022222-------------

  35. Evaluating An Alignment Not Generated With T-Coffee: T_coffee –infile CLUSTALW_ALN –in Library –do_score

  36. Running T-Coffee ONLINE

  37. WHERE ? Cedric.notredame@europe.com igs-server.cnrs-mrs.fr/~cnotred igs-server.cnrs-mrs.fr/Tcoffee

  38. The T-Coffee Server

  39. ES45, 4Proc1 Gb RAM The T-Coffee Server

  40. T-Coffee Server HP/Compaq-ES45/4-2G

  41. The T-Coffee Server

  42. Data Input

  43. The Right Parameters

  44. The T-Coffee Server

  45. Evaluating An Alignment

More Related