230 likes | 238 Views
This paper explores the use of grid computing to accelerate structure-based design against influenza neuraminidases, focusing on the H9N2, H7N7, and H5N1 strains. The study uses the AutoDock and DIANE frameworks to distribute the docking tasks on a traditional PC cluster and the LCG-GRID. The performance evaluation shows that DIANE/AutoDock provides efficient load balancing and handles the docking jobs effectively.
E N D
Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases Hurng-Chun Lee, Li-Yung Ho, andYing-Ta Wu* ywu@gate.sinica.edu.tw *Genomics Research Center Academia Sinica, Taiwan EGEE User Forum CERN, 01-03.03.2006
H9N2 H7N7 H5N1 H5N1 H5N1 Outline Influenza A Pandemic HA NA H1N1 H1N1 H2N2 H3N2 H1N1 2006 2005 Feb 26, 2006 92deaths/170cases http://www.who.int/csr/disease/avian_influenza EGEE User Forum, CERN, 01-03.03.2006
Neuraminidases cleave host receptors help release of new virions EGEE User Forum, CERN, 01-03.03.2006
R’ Oseltamivir R=H R’=amine Zanamivir R=guanidine Structure-Based Drug Design Neuraminidase and Inhibitors binding pocket EGEE User Forum, CERN, 01-03.03.2006
Drug-resistant variants and Point Mutation :Predicted mutation site by structure overlay and sequence alignment :Reported mutation site EGEE User Forum, CERN, 01-03.03.2006
Docking Engine:AutoDock 3.0.5 AutoGrid Garrett M. Morris David S. Goodsell Ruth Huey William E. Hart Scott Halliday Rik Belew Arthur J. Olson 1. Prepare the Target Protein -- add polar hydrogen atoms -- assign charges to atoms -- decide range of binding site 2. Run AutoGrid 3. Prepare the Ligand -- assign charges to atoms -- decide flexible bonds (run AutoTors) 4. Run AutoDock 5. Evaluate Results and Rank Score AutoTors AutoDock Morris et al. (1998), J. Computational Chemistry , 19 : 1639-1662. EGEE User Forum, CERN, 01-03.03.2006
Application Characteristic • Virtual screening based on molecular docking is the most time consuming part in structure-based drug design workflow • Number of docking tasks = N x M • N: number of ligands • M: number of target structures • CPU-bound application, huge amount of output, no communication between tasks • Task complexity is unpredictable • difficult to apply trivial domain decomposition method in splitting the tasks The pitiful … EGEE User Forum, CERN, 01-03.03.2006
Issues of the Grid applications • Due to the loose coupling nature, distributing application jobs on the Grid is not trivial • extra works are needed concerning the efficient job handling and result gathering • need also efforts to handle transient network or site problems • complexities should be hidden and the interface to end user should be application oriented • The significant Grid system overhead makes the Grid only benefit to the jobs with long computing time • not suitable for the pilot jobs for decision making EGEE User Forum, CERN, 01-03.03.2006
What is DIANE? DIANE = Distributed Analysis Environment • A lightweight framework for parallel scientific applications in master-worker model • ideal for applications without communications between parallel tasks (e.g. for most of the Bioinformatics applications in analyzing huge amount of independent dataset) • The framework takes care of all synchronization, communication and workflow management details on behalf of application EGEE User Forum, CERN, 01-03.03.2006
Distributing AutoDock tasks on the Grid using DIANE EGEE User Forum, CERN, 01-03.03.2006
Application specific job attributes Job level failure recovery definition DIANE/AutoDock A generic framework to which application can easily plug-in # -*- python -*- Application = 'Autodock' JobInitData = {'macro_repos' :'/home/hclee/diane_demo/autodock/macro', 'ligand_repos':'/home/hclee/diane_demo/autodock/ligand', 'ftprotocol':'gass', 'output_prefix':'autodock_test' } ## The input files will be staged in to workers InputFiles = [] ## The definition of failure recovery def failRecovery(self): print '*'*30 for t in self.master.tasks.failed(): print "ignoring failed task:",t t.ignore() print '*'*30 return 1 autodock.job % diane.startjob –-job autodock.job –ganga –w 32@lcg,32@pbs • Intuitive job execution command • Possible to mix heterogeneous computing backends EGEE User Forum, CERN, 01-03.03.2006
DIANE/AutoDock – integrated user interface EGEE User Forum, CERN, 01-03.03.2006
Performance Evaluation • Test case • 5 target protein: 1 protein, 5 conformations • ligand: 100 small compounds (with 7 positives ) • 500 docking tasks in total • Test environment • DIANE backend handler: SSH • Hardware spec: • Traditional PC cluster with NFS (2 x Intel Xeon 2.8 GHz + 2 GB memory per node) • Grid: LCG EGEE User Forum, CERN, 01-03.03.2006
Test Results DIANE/AutoDock framework on Cluster Each DIANE job contain 500 tasks (5 protein conformations x 100 compounds) Duration time : total elapsed time of a DIANE job EGEE User Forum, CERN, 01-03.03.2006
Handling docking jobs on traditional PC cluster Test Results DIANE/AutoDock framework on Cluster good load balance a DIANE/Autodock Task EGEE User Forum, CERN, 01-03.03.2006
DIANE/AutoDock framework on LCG-GRID terminated EGEE User Forum, CERN, 01-03.03.2006
Without redundant scheduling EGEE User Forum, CERN, 01-03.03.2006
With redundant scheduling job was reassigned to other nodes EGEE User Forum, CERN, 01-03.03.2006
Compound library enrichment All positives were docked within RMSD<1.5Å AutoDock parameters: translation / step=2.0 Å quaternion / step =20 degree torsion / step= 20 degree number of energy evaluation =1.5 X 106 max. number of generation =2.7 X 104 Run number =10 red = positives EGEE User Forum, CERN, 01-03.03.2006
Probe effects due to minor changes in target’s binding sites EGEE User Forum, CERN, 01-03.03.2006
Summary • Modeling compound-protein complex can be speeded up by distributing molecular docking processes on the Grid. • With the DIANE framework, distributing molecular docking tasks on the Grid can be easily implemented with intuitive interface for end user. • The DIANE framework also provides the functionalities by which the system can be easily tuned to tackle the issues in distributing molecular docking tasks on the loosely-coupled Grid. • This simple test case demonstrated that huge compound databases can be effectively enriched by executing docking tasks on Grid. However, more resources are required in order to build up a real HTP docking service for life science community. EGEE User Forum, CERN, 01-03.03.2006
Acknowledgements LCG-ARDA, CERN Li-Yung Ho Hurng-Chun Lee Hsing-Yen Chen Dr. Simon Lin Jakub Moscicki Dr. Massimo Lamanna Supports from Genomics Research Center, Academia Sinica National Science Council, Taiwan are highly appreciated EGEE User Forum, CERN, 01-03.03.2006
Interacting Complexes A key step to structure-based inhibitor design PDB1F8B EGEE User Forum, CERN, 01-03.03.2006