50 likes | 155 Views
Modelling Workshop - Some Relevant Questions Prof. David Jones dtj@cs.ucl.ac.uk University College London. Where are we now? Where are we going? Where should we be going? New ideas? Can we combine theory with experiment? What do users need? What are we trying to achieve?.
E N D
Modelling Workshop - Some Relevant QuestionsProf. David Jonesdtj@cs.ucl.ac.ukUniversity College London • Where are we now? • Where are we going? • Where should we be going? • New ideas? • Can we combine theory with experiment? • What do users need? • What are we trying to achieve?
Reasons to Predict Protein Structure • I want a picture of my protein for my thesis • EASY – but does anyone care if it’s wrong? • I want to identify possible domain boundaries • POSSIBLE • I want to identify surface residues • POSSIBLE • I want some clues as to the function of my protein • POSSIBLE? • I want to dock drug molecules into the structure • FORGET IT! NEED TO CLEARLY DEFINE WHO WANTS THE MODELS AND WHAT THEY WANT TO DO WITH THEM!
Types of Model and Production Costs • Comparative models • Automatic all atom model • 1 server @ 200 models per day) • $0.30/model • Sophisticated (multi-template) automatic model • 1 server @ 30 models per day) • $2/model • Human assisted (e.g. CASP) model • 1 human @ 100 models per year) • $200/model • “Deluxe” model – incorporating experimental data • 1 human @ 1-4 models per year) • $4000-$50000/model • Fold recognition models • Automatic low resolution model • 1 server @ 200 models per day • $0.30/model • Meta model • 10 servers @ 10 models per day • $5/model • Ab initio models • Automatic low resolution model (e.g. Robetta) • Beowulf cluster @ 5 models per day • $50/model • Hand built topological model – perhaps incorporating experimental data • 1 human @ 10 models per year • $2000/model • Docking models • Automatic: similar to ab initio folding • $50/model • “Deluxe” docking: equivalent to “Deluxe” modelling • $4000-$50000/model Assumptions: Scientist salary: $50000/year Server costs (inc. maintenance & support): $10000/year Beowulf costs (inc. maintenance, support and electricity): $100000/year Costs of experimental data not included Archiving costs should reflect cost of generating the models!
Methods for Quality Control • CASE 1 - Single PDB file with no supporting evidence • This is clearly of limited use • Can apply standard QC methods developed for X-ray structures (e.g. PROCHECK) • Many incorrect models pass these checks • Methods do exist which can generate reliability estimates (e.g. MODCHECK or ProQ) • However, these methods have reliability issues of their own • Can present a summary of various quality measures, but how can these be interpreted? • Which quality estimators do you believe? • CASE 2 – Model submitted with supporting evidence • Much more useful • What evidence? • Alignments • Method description • Experimental data (good – but how to evaluate) • Generating consistent quality measures based on a wide variety of methods and supporting evidence is going to require a lot of hard research • CASE 3 – Community modelling (many models for same target) • Example: CASP experiments or meta servers • Can generate global and local reliability scores from a large population of models • Cluster all structures using a structural similarity measure (which one?) • Derive "fold confidence" from the relative size of the largest cluster and some measure of tightness of the cluster according to a metric (GDT/RMSD/MaxSub/TM?) • Derive positional confidence in a similar way i.e. looking at the RMSD "scatter" for each equivalent position within the ensemble of superposed models • Side-chain confidence could also be estimated in a similar way e.g. looking at the scatter of chi angles or side-chain RMSDs.Problem here is that looking at 100 models from 100 different methods is very different from looking at 100 models from a single method e.g. the Robetta server. • Would allow a running "community quality measure" to be maintained for a particular target. So, as different models are submitted from different groups or servers, quality statistics could be compiled automatically - as more models are submitted the quality estimator will change over time • Would require models to be indexed according to target • Would need checks in place to stop poisoning of data (deliberate or accidental) • Might need to record performance histories of methods (sensitive issue to developers)