420 likes | 800 Views
Quality and effectiveness of protein structure models. DIMACS 2006. Anna . Tramontano @uniroma1.it. Molecular function. The paradigm. Molecular structure. Sequence. …. Detecting homology. 3.50. 3.00. 2.50. 2.00. 1.50. 1.00. 0.50. 0.00. 1.0. 0.8. 0.6. 0.4. 0.2. 0.
E N D
Quality and effectiveness of protein structure models DIMACS 2006 Anna.Tramontano@uniroma1.it
Molecular function The paradigm Molecular structure Sequence
… Detecting homology
3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 1.0 0.8 0.6 0.4 0.2 0 r.m.s.d. = [(1/N)Σ d2]1/2 Proteins evolve Fraction sequence identity after structural superposition Chothia and Lesk, EMBO J., 1986
AVGIFRAAVCTRGVAKAVDFVP + AVGIFRAAVCTRGVAKAVDFVP | || | | || ||||| || AIGIWRSATCTKGVAKA--FVA Comparative modelling If the alignment is correct, we can use the Chothia and Lesk relationship to predict the expected quality of the model
AVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIPAVGIFRAAVCTRGVAKAVDFVPVESMETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGAPVTYSTYGKFLADGGCSGGAYDIIICDECHSTDSTTILGIGTVLDQAETAGARLVVLATATPPGSVTVPHPNIEEVALSNTGEIP Fold recognition Score and select model Orengo, Curr. Op. Str. Biol, 1994
AVGIFRAAVCTRGVAKAVDFVP… AVGIFR AAVCTR GVAKAVDF Fragment based Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP… AVGIFR AAVCTR GVAKAVDF Fragment based Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP… AVGIFR AAVCTR GVAKAVDF Fragment based Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP… AVGIFR AAVCTR GVAKAVDF Fragment based Bystroff and Baker, JMB, 1998
AVGIFRAAVCTRGVAKAVDFVP… AVGIFR AAVCTR GVAKAVDF Fragment based Score and select model Bystroff and Baker, JMB, 1998
AVSRAFT RAFTAAF DGHTYIPK CASP: Critical assessment of techniques for protein structure prediction The evaluation Moult et al., Proteins, 1995
300 250 200 150 100 50 0 30000 25000 70 20000 60 15000 50 10000 40 5000 30 0 20 1 10 2 3 4 5 0 6 Groups Targets The evaluation Models Tramontano, NSB, 2003
120,00 110,00 100,00 90,00 casp6 80,00 70,00 casp4 Max P.AL0 60,00 casp5 50,00 40,00 30,00 20,00 0 20 40 60 80 m CASP4 CASP5CASP6: Best models The evaluation Cozzetto and Tramontano, Proteins, 2004
http://predictioncenter.gov State of the art Moult et al., Proteins, 2005.
http://www.caspur.it/PMDB State of the art Castrignano’ et al., NAR, 2006.
Diffraction data measurements Protein crystallization Protein preparation Phase estimation Model building Molecular replacement
Rotation search } ? Translation search Model Molecular replacement
ArpWarp Completely automatic procedure: CASP Models MolRep (10x10) AMoRe. (20) RefMac (10) Molecular replacement
100 80 60 40 ? GDT-TS (distance based measure) = [NCA(1Å)+NCA (2Å)+NCA (4Å) +NCA (8Å)]/4 Molecular replacement Giorgetti et al., Bioinformatics, 2005
What if we don’t know the quality of the model? What if we don’t know how to build models? Molecular replacement Giorgetti et al., submitted
ACTFGARTEADEASRTFCGAVHI GFRLPMNHTYWPLYHMVCS… Structure factors Molecular replacement Giorgetti et al., submitted
60% success rate Molecular replacement
60% success rate If one of the retrieved models works, the procedure is successful Molecular replacement
biological blood coagulation Function prediction catalityc activity molecular extra cellular cellular
? AVSRAFT RAFTAAF DGHTYIPK The experiment Moult et al., Proteins, 1995
Scheme of the experiment Collect known info on targets Ask people to provide ADDITIONAL information Compare predictions Is there a consensus? Once the structure is known, can we saymore? Function prediction
EC Number Binding Binding site(s)Residue role(s) PT modificationsFree text comments Function prediction Soro and Tramontano, Proteins 2005
We had too few predictions per target to derive any sensible conclusion. However,for the sake of the experiment, we tried to see what we could do and which would be the problems in analysing the data (other than the format)pretending that the numbers were significant. Function prediction
Summary table for target T0230 • Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group: General function prediction only; Category: Poorly characterized) • Predictions: • GO number GO name frequency • 287 magnesium ion binding1 • 4176 ATP-dependent peptidase activity1 • mannose-1-phosphate guanylyltransferase activity 1 1 • 4672 protein kinase activity1 • 5094 Rho GDP-dissociation inhibitor activity 1 • 5554 Molecular function unknown 1 - • 6812 PROCESS (1) • 6825 PROCESS(1) • 8170 N-methyltransferase activity 1 • 16822 hydrolase activity, acting on acid carbon-carbon bonds 1 • 46872 metal ion binding1 Function prediction
Summary table for target T0230 • Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group: General function prediction only; Category: Poorly characterized) • Predictions: • GO number GO name frequency GO Parents • 287 magnesium ion binding 1 46872, 43167, 5488 • 4176 ATP-dependent peptidase activity 1 8233, 16787, 3824 • mannose-1-phosphate guanylyltransferase • activity 1 8905, 16779, 16772, 16740, 3824 • 4672 protein kinase activity 1 16773, 16772, 16740 (16301), 3824 • Rho GDP-dissociation inhibitor • activity1 1 5092, 5083, 30695, 30234 • 8170 N-methyltransferase activity 1 8168, 16741, 16740, 3824 • hydrolase activity, acting on • acid carbon-carbon bonds 1 16787, 3824 • 46872 metal ion binding 1 43167, 5488 Function prediction
Summary table for target T0230 • Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group: General function prediction only; Category: Poorly characterized) • Predictions: • GO number GO name frequency GO Parents • 287 magnesium ion binding 1 46872, 43167, 5488 • 4176 ATP-dependent peptidase activity1 8233, 16787, 3824 • mannose-1-phosphate guanylyltransferase • activity1 8905, 16779, 16772, 16740, 3824 • 4672 protein kinase activity1 16773, 16772, 16740 (16301), 3824 • Rho GDP-dissociation inhibitor • activity1 1 5092, 5083, 30695, 30234 • 8170 N-methyltransferase activity1 8168, 16741, 16740, 3824 • hydrolase activity, acting on • acid carbon-carbon bonds1 16787, 3824 • 46872 metal ion binding1 43167, 5488 Function prediction
Summary table for target T0230 • Molecular function Unknown / COG annotation: Predicted metal-sulfur cluster biosynthetic enzyme (Group: General function prediction only; Category: Poorly characterized) • Predictions: • GO number GO name frequency GO Parents • 287 magnesium ion binding 1 46872, 43167, 5488 • 4176 ATP-dependent peptidase activity1 8233, 16787, 3824 • mannose-1-phosphate guanylyltransferase • activity1 8905, 16779, 16772, 16740, 3824 • 4672 protein kinase activity1 16773, 16772, 16740 (16301), 3824 • Rho GDP-dissociation inhibitor • activity1 1 5092, 5083, 30695, 30234 • 8170 N-methyltransferase activity1 8168, 16741, 16740, 3824 • hydrolase activity, acting on • acid carbon-carbon bonds1 16787, 3824 • 46872 metal ion binding1 43167, 5488 Function prediction 16787 hydrolase 16740 transferase activity 3824 catalyitic activity
Results: GO consensus Function prediction Soro and Tramontano, Proteins, 2005
18 months later… Annotations in DB decreased by 5% 24 new targets were annotated We looked at methods (abstracts, directly contacting predictors, literature) Function prediction
1 1 4 11011 1 10011 10100 2 10001 10000 11100 2 10101 5 11001 2 Function prediction
18 months later… 4 newly annotated targets had been correctly predicted by at least one method 85% of the consensus non redundant predictions were correct Function prediction
Results: GO consensus Function prediction Soro and Tramontano, Proteins, 2005
* * Function prediction * * * *
CASP is about to start again: We will start collecting targets next week There will be a few differences http://predictioncenter.org Announcments
Claudia Bonaccini Michele Ceriani Domenico Cozzetto Emanuela Giombini Alejandro Giorgetti Paolo Marcatili Veronica Morea Romina Oliva Massimiliano Orsini Marialuisa Pellegrini Domenico Raimondo Simonetta Soro Ivano Talamo Krzysztof Fidelis Tim Hubbard Andriy Kryshtafovych John Moult Burkhard Rost Adam Zemla Structural biologists Predictors Acknowledgements BioSapiens - EU VI Framework Ministero della Salute Universita' di Roma Istituto Pasteur Roma Facolta' di Medicina San Paolo CNR