190 likes | 414 Views
Building Grid-enabled Virtual Screening Service for Drug Discovery. Ying-Ta Wu 1 and Hurng-Chun Lee 2 1 Academia Sinica Genomic Research Center 2 Academia Sinica Grid Computing Center (ASGC). Outlines. Avian flu drug analysis on the Grid Developing grid-enabled virtual screening service
E N D
Building Grid-enabled Virtual Screening Service for Drug Discovery Ying-Ta Wu1 and Hurng-Chun Lee2 1Academia Sinica Genomic Research Center 2Academia Sinica Grid Computing Center (ASGC)
Outlines • Avian flu drug analysis on the Grid • Developing grid-enabled virtual screening service • The next large-scale virtual screening on avian flu
The virtual screening 300,000 Chemical compounds: ZINC Chemical combinatorial library Millions of chemical compounds available in laboratories High Throughput Screening $2/compound, nearly impossible Molecular docking (Autodock) ~137 CPU years, 600 GB data Data challenge on EGEE, Auvergrid, TWGrid ~6 weeks on ~2000 computers Hits sorting and refining In vitroscreeningof 100 hits Target (PDB) : Neuraminidase (8 structures)
1st large-scale avian flu virtual screening on the Grid • In 2006, a grid-enabled high-throughput screening against the H5N1 virus was performed • Matching 300,000 ligands against 8 targets using AutoDock • The computing requirement of 137 CPU years was tackled by the 6-weeks high-throughput screening (HTS) activity on EGEE, AuverGrid and TWGrid • Two different computing models (WISDOM and DIANE) were adopted for submitting docking jobs concerning two different user aspects (scalability and interactivity) • The goal is to analyze the efficiency of the known drugs to the possible Neuraminidases mutations
Results Compounds list Software Site1 Statistics Parameter settings Target structures Compounds sublists User interface Site2 Compounds database Storage Element Software Storage Element Results Computing Element Computing Element High-throughput screening using WISDOM • WISDOM: Wide In-Silico Docking On Malaria • The platform has been successfully tested in previous challenge • a workflow of Grid job handling: automatic job submission, status check and report, error recovery • push model job scheduling + batch mode job handling
Interactive screening using DIANE + GANGA • DIANE: Distributed Analysis Environment • An overlay system on top of a variety of distributed computing environment, taking care of all synchronization, communication and workflow management details on behalf of application • A lightweight framework for parallel scientific applications in master-worker model • Pull model job scheduling + interactive mode job handling with flexible failure recovery mechanism
The grid statistics • ~600 GBytes of docking results are produced and archived on the Grid • ~83% were successfully completed according to the Grid Logging and Bookkeeping; only ~70% of results were really produced on the Grid storage element
GNA 2.4% 15% cut off Enrichment of primary in silico HTS Original Type: T06 • 2qwe: Zanamivir (known drug) • five out of six known effective compounds can be identified in the first 15% of the ranking DAN 35% pKd=5.3 4AM 13% pKd=7.3 pKd=7.5 E = (5/6)/15% = 5.5 (< 1 in most cases) Ki=4uM Ki=150nM Ki=1nM GNA=zanamivir
Mutation effects top 5% by clustering top 15% by HTS 300,000 x15% = 45,000 45,000 x 5% = 2,250 autodock re-rank
T01 DNA 4AM 55% E119A 11.5% Effects of point mutation • Most known effective inhibitors lose their affinity in binding with a mutated target 2qwe: 2.4% 11.5% 1f8c: 13% 55%
Q: How to deliver an user-friendly service integrating the high-throughput virtual screening and the data analysis?
Lessons learnt • The grid-enabled virtual screening application does benefit the drug analysis in terms of money and time. • 137 CPU years in 6 weeks using about 2000 grid worker nodes • Primary HTS helps filter out 85% of compounds • Global enrichment rate: 5.5 • Mutations do affect the efficiency of the know drugs and potential hits • Gaps between the current system and a real end-user application • Lack of a well-annotated ligand database • Lack of a friendly user interface to run the virtual screening process on the Grid • Lack of an easy-to-use interface to access the produced docking results for further analysis • Lack of an automatic refinement pipeline
GUI - first step to real end-user application Interactive analysis Job History Progress monitoring
Common database • Chemical properties to better annotate the compounds • Results essential for further analysis are extracted and stored in a result database • Database access through AMGA • for access control • for data replication
Proposal of the 2nd data challenge • Proposed plan: • Testing phase: May, 2007 • Official launch: June, 2007 • Biology goals • Further analysis on the effect of the mutations • Further analysis on the open conformation of NA • Grid goals • Improving the service usability • Enabling the refinement pipeline • Reducing researchers’ effort in data analysis
Docking workflow preparation Contact point: Y.T. Wu E. Rovida P. D'Ursi N. Jacq Grid resource management Contact point: J. Salzemann TWGrid : H.C. Lee, H. Y. Chen AuverGrid : E. Medernach EGEE : Y. Legré Platform deployment on the Grid Contact point: H.C. Lee, J. Salzemann M. Reichstadt N. Jacq Users (deputy) J. Salzemann (N. Jacq) M. Reichstadt (E. Medernach) L. Y. Ho (H. C. Lee) I. Merelli, C. Arlandini (L. Milanesi) J. Montagnat (T. Glatard) R. Mollon (C. Blanchet) I. Blanque (D. Segrelles) D. Garcia Credit
Mini Workshop • tomorrow afternoon from 2 pm at Conference Room 4 • Discussions on the collaboration issues of the 2nd avian flu data challenge Welcome your participation!
DIANE Directory Service • Improving the scalability of the DIANE framework • The Directory Service is a server containing a list of all the masters • The Master register itself to the Directory Service • The Workers obtain a Master through the Directory Service • Directory Service has an algorithm for the load balancing of the workers and prioritization of the masters