690 likes | 897 Views
Multiple Alignments and Multivariate Analysis. Clustal: 1988-2006. Multiple Alignments. Human beta --------VHLT PEEKSAVTALWGKV N–- VDEVGGEALGRLLVV YP WTQR FFESFGDLST Horse beta --------VQLS GEEKAAVLALWDKV N–- EEEVGGEALGRLLVV YP WTQR FFDSFGDLSN
E N D
Multiple Alignments and Multivariate Analysis Clustal: 1988-2006
Multiple Alignments Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS- Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS- Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : . Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------ Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------ Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------ Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------ Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY------- Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . : Phylogenetic Analysis Secondary Str. Prediction Homology Detection Profile Analysis Homology Modeling
VHLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNP -VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-----HGSA * * * * * **** * * *** * * * * * *** * KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF QVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHL ** ***** * ** * ** ** ** *** ** ** * ** * GKEFTPPVQAAYQKVVAGVANALAHKYH PAEFTPAVHASLDKFLASVSTVLTSKYR **** * * * * * * ** • Dynamic Programming • Needleman and Wunsch, 1970 • O(L2) algorithm • Maximise score (or minimise distance) • Gap penalties • Amino acid weight matrix
Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS- Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS- Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : . Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------ Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------ Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------ Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------ Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY------- Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . : Weighted Sums of Pairs: WSP Time O(LN)
Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS- Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS- Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : . Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------ Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------ Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------ Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------ Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY------- Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . : Weighted Sums of Pairs: WSP Sequences Time Time O(LN) 2 1 second 3 150 seconds 4 6.25 hours 5 39 days 6 16 years 7 2404 years
Horse beta Human beta Horse alpha Human alpha Whale myoglobin Lamprey cyanohaemoglobin Lupin leghaemoglobin Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS- Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS- Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : . Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------ Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------ Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------ Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------ Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY------- Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . : • Progressive Alignment: • Feng and Doolittle, 1987 • Barton and Sternberg, 1987 • Willie Taylor, 1987, 1988 • Hogeweg and Hesper, 1984
Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS- Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS- Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : . Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------ Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------ Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------ Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------ Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY------- Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . : Horse beta Human beta Horse alpha Human alpha Whale myoglobin Lamprey cyanohaemoglobin Lupin leghaemoglobin
Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS- Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS- Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : . Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------ Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------ Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------ Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------ Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY------- Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . : Horse beta Human beta Horse alpha Human alpha Whale myoglobin Lamprey cyanohaemoglobin Lupin leghaemoglobin
Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS- Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS- Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : . Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------ Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------ Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------ Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------ Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY------- Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . : Horse beta Human beta Horse alpha Human alpha Whale myoglobin Lamprey cyanohaemoglobin Lupin leghaemoglobin
Clustal • 35000 citations • Clustal1-Clustal4 1988 • Paul Sharp, Dublin • Clustal V 1992 • EMBL Heidelberg, • Rainer Fuchs • Alan Bleasby • Clustal W 1994-2006, Clustal X 1997-2006 • Toby Gibson, EMBL, Heidelberg • Julie Thompson, ICGEB, Strasbourg • Clustal W and Clustal X 2.0 early 2007 • University College Dublin
Since 1994? Benchmarks Protein structure alignments and superpositions • Barton and Sternberg; Fitch and McLure • Dali • BaliBase • Homstrad • Oxbench • Prefab etc. etc. • Protein structure analysis • APDBO'Sullivan O, Zehnder M, Higgins D, Bucher P, Grosdidier A, Notredame C. (2003) APDB: a novel measure for benchmarking sequence alignment methods without reference alignments. Bioinformatics. 2003;19 Suppl 1:i215-21. • RNA alignments • Bralibase (Gardner PP, Wilm A & Washietl S (2005) NAR.)
Which Method is Best? • Clustal W???? • MSA (Lipman, Altschul, Kececioglu) • DCA (Stoye), PRRP (Gotoh) , SAGA (Notredame) • T-Coffee (Notredame) • 3-D Coffee M-Coffee • MAFFT (Katoh) and MUSCLE (Edgar) • Probcons (Do, Brudno, Batzoglu) For Global Protein alignments!!!
Clustal W and X 2.0? • Jan 2007 • Re-engineered in C++ • Aim to increase accuracy • Iteration (Wallace, I. M., O'Sullivan, O. and Higgins, D. G., 2005 Evaluation of iterative alignment algorithms for multiple alignment. Bioinformatics 21:1408.) • Reduce run times
ADE-4 http://pbil.univ-lyon1.fr/ADE-4/ Thioulouse J., Chessel D., Dolédec S., & Olivier J.M. (1997) ADE-4: a multivariate analysis and graphical display software. Statistics and Computing, 7, 1, 75-83.
Between Group Analysis BGA Dolédec, S. & Chessel, D. (1987) Acta Oecologica, Oecologica Generalis, 8, 3, 403-426.Supervised Correspondence Analysis or PCA • MADE4 • Culhane, A., Thiolouse, J., Perriere, G., Higgins, D.G. (2005) MADE4: an R package for multivariate analysis of gene expression data. Bioinformatics. 21(11):2789-2790. • CO-Inertia Analysis CIA • Dolédec, S. & Chessel, D. (1994) Freshwater Biology, 31, 277-294. • Thioulouse, J. & Lobry, J.R. (1995) CABIOS, 11, 321-329 • 2 datasets; Simultaneous CA or PCA
Use CA, PCA for Sequences? PCOORD on sequence distances: Higgins, D.G. (1992) Sequence ordinations: a multivariate analysis approach to analysing large sequence data sets. CABIOS, 8, 15-22. PCA on dipeptide composition: Van Heel, M. (1991)A new family of powerful multivariate statistical sequence analysis techniques.J. Mol Biol.220(4): 877-887. PCA on alignment columns: Casari G, Sander C, Valencia A. (1995)A method to predict functional residues in proteins.Nat Struct Biol. 2(2):171-8.
Supervised PCA or CA? Malate Dehydrogenases Lactate Dehydrogenases
Between Group Analysis samples genes GSVD N
15 Chymotrypsins Trypsin-like serine proteases 10 Elastases 31 Trypsins
BGA With CA or PCA? • CA: • Pretty pictures • Sequences/residues plots • Finds any clear/simple patterns • Binary aa variables • PCA: • Use continuous variables • e.g. aa properties: size, charge, hydrophobicity etc.
15 Chymotrypsins 31 Trypsins Sequences BGA with PCA using 5 amino acid properties (A-E) 10 Elastases Residue weights
BGA on Alignments • Focus on any split in the data • Binary or Property coding • CA or PCA • Sequence Weighting • Pseudocounts
Iteration Benchmarking Clustal W 2.0 Gordon Blackshields Mark Larkin Paul McGettigan Iain Wallace Clustal Toby Gibson, EMBL Julie Thompson, ICGEB, Strasbourg BGA, CIA, MADE4 Aedín Culhane Guy Perriere Jean Thiolouse Ian Jeffery Ailís Fagan
SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD THE FAST CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE FAT CAT SeqA GARFIELD THE LAST FA-T CATSeqB GARFIELD THE FAST CA-T ---SeqC GARFIELD THE VERY FAST CATSeqD -------- THE ---- FA-T CAT
Weighted Sums of Pairs MSA Branch and Bound Lipman, Altschul and Kececioglu, 1989 FastMSA Tweaked MSA Gupta, Kececioglu and Schaeffer, 1995 DCA Divide and Conquer Stoye, Moulton and Dress, 1997 SAGA Genetic Algorithm Notredame and Higgins, 1996 PRRP Iteration Gotoh, 1996
Genetic Algorithm Selection (WSP) MutationRecombination (cross-overs)
Genetic Algorithm Selection (WSP) MutationRecombination (cross-overs)
Genetic Algorithm Selection (WSP) MutationRecombination (cross-overs)
SAGA • Cedric Notredame • Sequence Alignment by Genetic Algorithm • Optimise any objective function • Notredame, C. and Higgins, D.G. (1996)SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Research, 24:1515-1524.
Structure Test Cases MSA SAGA
Structure Test Cases MSA SAGA
Which method is best? • Best score? • Empirical tests? • Sets of test cases • Fitch and McLure • BaliBase • Homstrad • Oxbench • Prefab etc. etc. • APDBO'Sullivan O, Zehnder M, Higgins D, Bucher P, Grosdidier A, Notredame C. (2003) APDB: a novel measure for benchmarking sequence alignment methods without reference alignments. Bioinformatics. 2003;19 Suppl 1:i215-21.
COFFEE • Consistency based Objective Function For Evaluation of Ehhhh things • Maximum Weight Trace (John Kececioglu) • Maximise similarity to a LIBRARY of residue pairs • Notredame, C., Holm, L. and Higgins, D.G. (1998) COFFEE: An objective function for multiple sequence alignments. Bioinformatics 14: 407-422.
e.g. Seq N, Residue I Seq M, Residue J Weight = w Pairs of Residues Human beta VHLTPEEKSAVTALWGKVN–-VDEVGGEALHorse beta VQLSGEEKAAVLALWDKVN–-EEEVGGEALHuman alpha –VLSPADKTNVKAAWGKVGAHAGEYGAEALHorse alpha –VLSAADKTNVKAAWSKVGGHAGEYGAEAL
T-Coffee • Heuristic approximation to COFFEE • Uses progressive alignment (Trees) • Heterogenous data • Sequences • Structures • Genomes • ESTs • Notredame, C, Higgins, DG and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J.Mol.Biol., 302: 205-217.
T-Coffee • Mixed data sources • Primary library from • Lalign (SIM): • 10 best local alignments • Clustalw • All pairwise alignments • SAP (Willie Taylor, Structure Superposition) • Multiple alignments • Check library for CONSISTENCY • Upweight pairs of residues that agree with other pairs Default
Mixing Heterogenous Information Local Alignment Global Alignment Multiple Alignment Specialist Structural T-Coffee Multiple Sequence Alignment Copyright Cédric Notredame, 2000, all rights reserved
Mixing Heterogenous Information e.g. SAP Taylor and Orengo Structure Superposition Weighted Residue Pairs Copyright Cédric Notredame, 2000, all rights reserved
66.49 80 60 38.39 35.24 %accuracy 40 20 0 clustalw T_Coffee Default T_Coffee plus all structures Including Structures in an Alignment 3D-Coffee O’Sullivan, O., Suhre, K., Abergel, C., Higgins, DG and Notredame, C (2004) J.Mol.Biol.
Recent Developments • 20-30 new programs in past 2 years • MUSCLE • Bob Edgar, ISMB, 2004 • Iteration/progressive alignment • FAST • Big Alignments • PROBCONS • Tom Do, Michael Brudno, Serafim Batzoglou • ISMB 2004 • “P-Coffee” • VERY accurate