1 / 19

Zheng-Qing (Albert) Fu SER-CAT, APS, Argonne National Laboratory Biochem. & Mol. Biology, Univ. Of Georgia, Athens,

Automatic Structure Determination --- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and systematically search algorithm/program and parameter space. Zheng-Qing (Albert) Fu SER-CAT, APS, Argonne National Laboratory

mircea
Download Presentation

Zheng-Qing (Albert) Fu SER-CAT, APS, Argonne National Laboratory Biochem. & Mol. Biology, Univ. Of Georgia, Athens,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Structure Determination--- given a data set, solve the structure quickly and better, by using a parallel workflow engine to automatically and systematically search algorithm/program and parameter space • Zheng-Qing (Albert) Fu • SER-CAT, APS, Argonne National Laboratory • Biochem. & Mol. Biology, Univ. Of Georgia, Athens, Georgia 2007 ACA Summer School

  2. What we learnt from Structural Genomics All Targets Cloned Crystals Structures Cloned (7%) Crystals (33%) StructuresOverall Success Rate (from Clone to Structure): 2.45%

  3. From gene to final structure, crystallographic analysis of protein structures is a complicated Multi-Step, Multi-Discipline, Costly, and Systematic Engineering Project. Structure Refinement Map Tracing Phasing Tedious & Time Consuming Data Processing Data Collection Data Collection, Data Procession and Structure Solving Process (Intensive Computing) Crystallization Key to Success Protein Prep Bottle Neck Gene Fu (2002): Diffraction Methods In Structural Biology, Gordon Research Conferences. New London, CT, USA.

  4. Why Automation? Reason #1 Automation may optimize the steps of the whole process, and thus improve the success rate and accuracy of the final structure.

  5. Why Automation? Reason #2: The Structural Biology in the post-genomics era challenges the X-ray crystallography to provide better hardware, better software and better full services. <<< A Decade Ago >>> Every Structural Biologist was also an Excellent Crystallographer <<< Nowadays >>> Most of the new-generation Structural Biologists only know, if any at all, some basic concepts of Crystallography. They depends on other people’s recipes, and at most learn how to run a bunch of computer programs. Do they want to, or have ability to solve new problems related to Crystallography?

  6. Why Automation? Reason #3: Even experienced crystallographer may make careless mistakes, too. Blood Coagulation Inhibitor:A small protein containing 12 Cys. Source: venom of habu (rattlesnake). A good target for S phasing.Native Data were collected at both home source and SER-CAT synchrotron beam line. Synchrotron Source (1.74Ǻ) Home Cr Source (2.29Ǻ) Automation may help avoid such un-recoverable mistakes that may happen at any step of the complicated process.

  7. Automation of Part of the Whole Process from Data Collection to Structure-Solving Feasibility, Current Implementation Data Acquisition & Processing Structure-Solving Process

  8. Data Acquisition & Processing

  9. 1). How to detect and avoid these problems before too late? During data collection, any problem with the diffraction systemsuch as of: X-ray source Shutter Goniometer & Stage Detector Crystal Mounting Other mechanical, optical, electronic defects etc.can ruin the data quality, leading to failure of the whole process.

  10. In addition to the unexpected problems, there are many other issues during data collection: 2). Is the diffraction quality is acceptable? 3). Is the data quality still improving? 4). Is the data collected enough to solve the structure? 5). Should continue collecting more frames or better mount another fresh crystal? All these questions can be answered if and only if we know how to monitor the Signal/Noise ratio during data collection.

  11. A New Statistic Index, Ras, to More Objectively and Accurately Evaluate Signal/Noise Ratio Signal/Noise ratio1) Ras = Da/Dc Da = <DI/sI>a Here Dais the ratio of Bijvoet difference and the standard error in intensity, calculated using accentric reflections. Dcis statistically evaluated as Da, but using centric reflections. Theoretically, it should be zero. Dcis the counter-part of Da, and thus can serve as the indicator of noise level. Dc = <DI/sI>c Ras, thus defined, can server as a signal/noise ratio in terms of anomalous scattering. The higher the better. Tests show that it is more objective and reliable than other indices currently used for measuring anomalous signal. 1). Fu et al. (2004). Acta Cryst D60:499-506.

  12. Signal-based Data Collectionwith Ras as a reliable indicator, diffraction data can be acquired more appropriately for a given crystal, by monitoring the Signal/Noise ratio through the data collection

  13. Structure-Solving Process

  14. After data processed, we have to face a set of different issues in the structure-solving process 1). There are numerous programs (or algorithms) to choose. A program may outperform others in some cases and vise versa. Which programs to use? 2). Each program has multiple parameters. Which parameters to adjust? What combination of the parameters can give the best result? 3). If phasing produced a traceable map, is it the best map for you to work on for fitting, refining to complete the structure?

  15. For a given data set, combination of different programs or parameter settings can produce totally different results. Some may succeed to give a solution, but many others will fail1). Test result on solving the structure of a hydrolase protein (864AAs, 30Se). The 2.8Å data was provided by Dr. Turner. Green dots are the percentages of residues automatically traced from maps generated by phasing with different programs (SHELXD, ISAS, SOLVE, RESOLVE) and parameter settings. Pink represents resolution cutoff for heavy atom sites searching. Solid squares indicate SHELXD, while open ones for SOLVE. Blue represents resolution cutoff for phasing and density modification. Solid diamond marker indicate SOLVE/RESOLVE, while open one as ISAS. The Current common Try & Error practice in solving a structureis time-consuming and tedious. It may not give the best solution, and may even fail to find any solution at all for data with marginal quality. 1). Fu, Rose, Wang (2005): Acta Cryst D61:951-959.

  16. Parallel Workflow Engine to systematically search program and parameter spaces to find the best solution for given data. Figure 1. The dark blocks represent parallel tasks dynamically generated from various crystallographic computing programs with different parameter settings. The tasks are distributed by workflow engine to the computing facility and run parallel. Upon completion, the workflow engine will harvest and analyze the results, and dynamically create and start another group of tasks for the next step. And so on, until the whole process finishes. Fu (2003). Proceeding of the 5th Int. Conference on Mol. Struct. Biology. Vienna, Austria, Sept. 3-7. Fu et al. (2005). Acta Cryst D61:951-959.

  17. Algorithm and Design

  18. Where are we?

  19. AcknowledgmentGeorge Wu and many Ph.D. students including Dongsheng Che, Jizhen Zhao, Feng Sun, Haijin Yan, Dept. of Computer Sciences, UGAB.C.Wang, John Rose, SER-CAT, SECSG, UGAJohn Chrzas, Zhongmin Jin, Jim Fait, SER-CAT, APSAndy Howard, Illinois Institute of TechnologyRobert Sparks, Bruker (formerly Siemens) AXS Inc.Xuong Nguyen-Huu, UC San DiegoGeorge Sheldrick,University of Göttingen, Germany.Randy Read, Cambridge University, EnglandTom Terwilliger, Los Alamos National LabPeter Briggs (CCP4, England) and Authors of all the programs plugged into SGXPro.Work is supported in part with funds from the National Institute of Health (GM62407) and SERCAT, APS

More Related