170 likes | 245 Views
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York. Outline. Introduction Database System manager Scientific programs Calibrating the System A Example Release and Development Plan. Introduction.
E N D
BALBES (Current working name)A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York
Outline • Introduction • Database • System manager • Scientific programs • Calibrating the System • A Example • Release and Development Plan
Introduction • The number of entries in the Protein data bank (PDB) is increasing every year. It has many implications to Macromolecular crystallography. One challenge is how to use them efficiently in development of a structure solution software. • Analysis of the PDB shows that this year around 67% of all the deposited structures reported to be solved by molecular replacement. • With better algorithms and organisation of data bank it is expected that the above number can be substantially higher. Our system contains three main components, (1)reorganised database, (2) a manager written in PYTHON that makes decision and (3) scientific programs such as MOLREP and REFMAC
Database: Reorganisation of PDB • All entries in the PDB have been analysed according to their homology and only non-redundant set of structures were stored. • Hierarchical database was organized according to sequence identities • If domains are present, information about them was stored • Multimiers of a structure • Fragments of various lengths (under way) • Intensity curves for various types of macromolecules(later)
Database: (continue) A Database of portable size is created, which enables • fast search for similar structure (less than 10 seconds in a typical MAC G5 processor for most test cases so far) • all action performed locally (independent on internet) • provide required information of the similar structures(domains, tertiary structures)
System Manager It is written using PYTHON and relies on files of XML format for information exchange: • Data • Twinning • Pseudotranslation • Resolution for molecular replacement • Completeness and other properties • Sequence • Finds template structures with their domain and multimeric organisations • Finds number of molecules in the asymmetric unit • “Corrects” template molecules using sequence alignment • Protocols • Runs various protocols with molecular replacement and refinement and makes decisions accordingly
Scientific programs MOLREP - molecular replacement Simple molecular replacement, Phased rotation, translation functions, spherically averaged phased translation function, dyad search, search with one model fixed etc REFMAC Maximum likelihood refinement, phased refinement, rigid body refinement, extensive dictionary, map coefficients etc SFCHECK Twinning tests, psuedotranslation, optical resolution, optimal resolution for molecular replacement, analysis of coordinates against electron density etc Auxiliary programs: Alignment, search in DB, analysis of sequence and data to suggest number of expected monomers, removal of bits of structure from coordinates according to fit into electron density, semiautomatic domain definition etc
Calibrating the System Step 1: Making the database In the PDB there were more than 30,000 structures deposited up to end of 2004, but only ~10,000 were non-redundant. These 10,000 were used to construct our database of known structures. Step 2: Testing the system: ~1000 structures were deposited between Jan-May 2005. We tried to solve all of these with our automated approach. The success rate was ~75% with our current version. This is actually higher than the proportion reported as solved using MR!
Method Case Number Success Cases Rate (%) ALL 1027 777 75.6 MR 695 609 87.6 SAD 80 23 28.8 MAD 117 40 34.1 SIR 10 5 50 MIR 23 9 39.0 OTHER 102 89 87.9 Overall test results Reported in PDB Test Case Statistics Note that not all structures that were used as a search model are present in our DB
Schematic view of the success rate of our system Solved automatically by our system - 75% All 100% Reported to be solved by MR 67%
Progress to date We are analysing all failed cases and have already significantly enhanced the system as a result. We have developed several new techniques by carefully analysing these results. Success is great for funding! Failure is great for future developments!
Example: Addition of domains Yes Search with the whole molecule Is it solution? Refine and exit No No MR for each domain and find the best Yes Are there domains? Other protocols Refine and produce map Mask out found domain(s) Use SPTF, PRF, PTF to find missing domains No Yes Is solution complete? Is it solution? No Yes Refine and exit Other protocols
Example: Domain motions - 1tj3 Finding whole molecule was problematic. Finding the large domain refining and then using SPTF/PT/TF using masked map was straightforward
Conclusions • Database is an essential ingredient of efficient automation • With relatively simple protocols it will be possible to solve more than 80% of structure automatically • Interplay of different protocols is very promising • Huge number of tests help to prioritise developments and generate ideas
Development Plans Development currently under way and in immediate future: • Update database by adding entries based on PDB files deposited in 2005 (Thanks Eugene for PISA, which we use for multimer analysis) • Add multichain domain definitions • Test the system against PDB files deposited in 2006 • Target release date: May-June 2006 • Combine with some protocols from experimental phasing and automatic model building (Foadi, Cowtan) Future: • Combine with automatic model building • Make decision during refinement about twinning and other properties • Pass information about search templates to refinement • Combine with experimental phasing • Regular update
Acknowledgements All CCP4 and YSBL people Wellcome Trust, BBSRC, EU BIOXHIT, NIH for support