1.65k likes | 1.7k Views
Evolution teaches to predict protein structure and function. Burkhard Rost CUBIC Columbia University rost@columbia.edu http://www.columbia.edu/~rost http://cubic.bioc.columbia.edu/. Evolution teaches prediction. Is Bioinformatics up to the data deluge?
E N D
Evolution teaches to predict protein structure and function Burkhard Rost CUBIC Columbia University rost@columbia.edu http://www.columbia.edu/~rost http://cubic.bioc.columbia.edu/ Burkhard Rost (Columbia New York)
Evolution teaches prediction • Is Bioinformatics up to the data deluge? • Sequence comparison: do we know what we do? • conservation of structure and function • Structure prediction: where are we today? • How to learn from the evolutionary odyssey? • secondary structure • transmembrane proteins • solvent accessibility • Are 1D predictions useful? • sub-cellular localisation • whole genomes • 3D structure: threading • floppy regions Burkhard Rost (Columbia New York)
http://cubic.bioc.columbia.edu/ • Claus Andersen Copenhagen • Bastian Bruning Nijmegen • Hepan Tan Columbia • Trevor Siggers Columbia • Miguel Andrade EMBL • Sean O’Donoghue LION • Andrej Sali Marc Marti-Renom Rockefeller • Alfonso Valencia Florencio Pazos Madrid • Michal Linial Jerusalem Volker Eyrich Rajesh Nair Jinfeng Liu Dariusz Przybylski Yanay Ofran Henry Bigelow Kazimierz Wrzeszczynski Sven Mika Chien Peter Chen Burkhard Rost • http://cubic.bioc.columbia.edu/ Burkhard Rost (Columbia New York)
CUBIC http://cubic.bioc.columbia.edu Dariusz Przybylski Trevor Siggers Volker Eyrich Murat Cokol Jinfeng Liu Hepan Tan Burkhard Rost (Columbia New York)
The Data Deluge Conclusion: Bioinformatics will have a hell of a problem Burkhard Rost (Columbia New York)
Data Deluge: what do we want? Burkhard Rost (Columbia New York)
Data Deluge: numbers 50 1.200.000 500.000 2000 17.000 800 35.000 Burkhard Rost (Columbia New York)
Data Deluge: what CAN we do? Burkhard Rost (Columbia New York)
Data Deluge: we CAN we do? Not much … … yet Burkhard Rost (Columbia New York)
Evolution teaches prediction • Bioinformatics up to the data deluge? NO, but work in progress! • Sequence comparison: do we know what we do? • conservation of structure and function • Structure prediction: where are we today? • How to learn from the evolutionary odyssey? • secondary structure • transmembrane proteins • solvent accessibility • Are 1D predictions useful? • sub-cellular localisation • whole genomes • 3D structure: threading • floppy regions Burkhard Rost (Columbia New York)
Dynamic programming: optimal alignment Burkhard Rost (Columbia New York)
? ? BLAST: fast matching of single ‘words’ Burkhard Rost (Columbia New York)
Profile-based comparison Burkhard Rost (Columbia New York)
Zones Burkhard Rost (Columbia New York)
Sequence -> Structure • Sequence folds into unique structure S -> T Burkhard Rost (Columbia New York)
Sequence -> Structure • Sequence folds into unique structure S -> T • Similar sequences fold into similar structures S + S’ -> T Burkhard Rost (Columbia New York)
Sequence -> Structure • Sequence folds into unique structure S -> T • Similar sequences fold into similar structures S + S’ -> T • Most sequences don’t fold, at all S -> no T Burkhard Rost (Columbia New York)
Percentage sequence identity 10 15 20 25 30 35 6 10 5 10 4 10 Number of protein pairs 3 10 2 10 1 10 -15 -10 -5 0 5 10 Distance from HSSP threshold Twilight zone = false positives explode Burkhard Rost (Columbia New York) B Rost 1999 Prot. Engin.:12, 85-94
Significant sequence identity Burkhard Rost (Columbia New York) B Rost 1999 Prot. Engin.:12, 85-94
Evolution did it ! Burkhard Rost (Columbia New York) B Rost 1999 Prot. Engin.:12, 85-94
Similar sequence -> similar structure? Burkhard Rost (Columbia New York) B Rost 1999 Prot. Engin.:12, 85-94
Detecting true hits in Twilight zone Burkhard Rost (Columbia New York) B Rost 1999 Prot. Engin.:12, 85-94
Finding similar structures in Twilight zone Burkhard Rost (Columbia New York) B Rost 1999 Prot. Engin.:12, 85-94
‘Secure’ thresholds for BLAST Burkhard Rost (Columbia New York) B Rost 1999 Prot. Engin.:12, 85-94
Accuracy vs. coverage Burkhard Rost (Columbia New York)
BLAST is not enough ... Burkhard Rost (Columbia New York) B Rost 1999 Prot. Engin.:12, 85-94
Sequence Space Hopping Burkhard Rost (Columbia New York) B Rost 1999 Prot. Engin.:12, 85-94
Success through sequence space hopping Burkhard Rost (Columbia New York) B Rost 1999 Prot. Engin.:12, 85-94
Zones Burkhard Rost (Columbia New York)
Profile-based database search Burkhard Rost (Columbia New York) B Rost 2001 Structural Bioinformatics:in press
Profile-based database search Burkhard Rost (Columbia New York)
Profile-based database search Burkhard Rost (Columbia New York)
Profile-based database search Burkhard Rost (Columbia New York)
Profile-based database search Burkhard Rost (Columbia New York)
Profile-based database search Burkhard Rost (Columbia New York)
Zones Burkhard Rost (Columbia New York)
Hypothetical distribution of similar structures Burkhard Rost (Columbia New York)
FAKE DATA Burkhard Rost (Columbia New York)
Midnight zone: real - random AS Yang and B Honig 2000 J. Mol. Biol.:301, 679-689 Burkhard Rost (Columbia New York) B Rost 1997 Folding & Design:2, S19-S24
1600 1200 Number of structure pairs 800 400 0 0 25 50 75 100 0 5 10 15 20 25 Percentage pairwise sequence identity Evolution into the Midnight zone Burkhard Rost (Columbia New York) B Rost and S O'Donoghue 1998 EMBL preprint
Protein structures evolved at random - almost • average < 10% • -> most pairs have ‘random’ identity levels • 3 - 4% anchor residues • 4 billion years of evolution reached equilibrium • rate of creating new structures slower than drift towards mean • averages for convergent and divergent evolution similar • convergent evolution may have been a major event Burkhard Rost (Columbia New York)
Structure space Burkhard Rost (Columbia New York) B Rost 1998 Structure:6, 259-263
Gold-mine out of reach! Percentage of pairs Burkhard Rost (Columbia New York)
Conservation of function Devon & Valencia 2000, Proteins, 41, pp. 98 Burkhard Rost (Columbia New York)
Conservation of EC number Burkhard Rost (Columbia New York)
Conservation of EC number 2 Burkhard Rost (Columbia New York)
Conservation of EC number: BLAST Burkhard Rost (Columbia New York)
Conservation in detail Burkhard Rost (Columbia New York)
Accuracy vs. coverage: EC number Burkhard Rost (Columbia New York)
Conservation of EC numbers Burkhard Rost (Columbia New York)