1 / 23

Presented by Vinod Kasam

Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria. Presented by Vinod Kasam. CLADE workshop, HPDC conference, June 25, 2007, Monterey Bay. Outline. Wisdom introduction Biological targets Resources used in wisdom

donkor
Download Presentation

Presented by Vinod Kasam

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Enabled High Throughput Virtual Screening Against Four Different Targets Implicated in Malaria Presented by Vinod Kasam CLADE workshop, HPDC conference, June 25, 2007, Monterey Bay

  2. Outline • Wisdom introduction • Biological targets • Resources used in wisdom • Production environment • Results • Issues • Conclusions 25-06-2007, Monterey Bay

  3. Introduction to the disease : malaria • ~300 million people worldwide are affected • 1-1.5 million people die every year • Widely spread • Caused by protozoan parasites of the genus Plasmodium Complex life cycle with multiple stages 25-06-2007, Monterey Bay

  4. WISDOM-II, second large scale docking deployment against malaria Involved in Malaria target Biology partners GST from Plasmodiumfalciparum Parasite detoxification U. of Pretoria, South-Africa DHFR from Plasmodiumvivax Parasite DNA synthesis U. of Los Andes, Venezuela U. of Modena, Italia DHFR from Plasmodium falciparum Parasite DNA synthesis U. of Modena, Italia Tubulin from Plasmodium/plant/mamal Parasite cell replication CEA, Acamba project, France 25-06-2007, Monterey Bay

  5. WISDOM : Wide In Silico Docking On Malaria • Biological goal Proposition of new inhibitors for a family of proteins produced by Plasmodium • Biomedical informatics goalDeployment of in silico virtual docking on the grid • Grid goal Deployment of a CPU consuming application generating large data flows to test the grid operation and services => “data challenge” 25-06-2007, Monterey Bay

  6. High Throughput Virtual Docking Compounds: ZINC- 4,3M Chembridge - 500 000 Millions of chemical compounds available High Throughput Screening 1-10$/compound, several hours Molecular docking (FlexX, Autodock) 20 cents/compound, 1 minute Data challenge on EGEE ~ 3 months on ~2000 computers Hits screening using assays performed on living cells Leads Targets: 3D structures in PDB One homology model Clinical testing Drug 25-06-2007, Monterey Bay

  7. Objective of the WISDOM development • Objective • Dock a whole compound database in a limited time with a minimal human involvement during the data challenge. • Need an optimized environment • Production in Limited time • Performance are important • Need a fault tolerant environment • Stress usage of the grid during the DC • Grid is heterogeneous and dynamic • Data produced are important and can’t be easily reproduced • Need an automatic production environment • Grid API are not fully adapted for a bulk use at a large scale • Ease the execution • User-friendly hi-level services 25-06-2007, Monterey Bay

  8. Use of a production system • Managing thousands of jobs and files is a manually labor-intensive task • Job preparation, submission and monitoring, output retrieval, failure identification and resolution, job resubmission… • In order to efficiently use the resources • The amount of transferred data impacts on grid performance • The data must be installed on the grid • The database is stored into subsets • Grid process introduces significant delays • The submitted jobs must be sufficiently long in order to reduce the impact of this middleware overhead • The production system will provide automated and fault-tolerant jobs and files management 25-06-2007, Monterey Bay

  9. Grid added value for international collaboration on neglected and emerging diseases • Grids offer unprecedented opportunities for sharing information and resources world wide • Grids are unique tools for : • Collecting and sharing information (Epidemiology, Genomics) • Networking experts • Mobilizing resources routinely or in emergency (vaccine & drug discovery) 25-06-2007, Monterey Bay

  10. Grid added value of EGEE for a large scale in silico experimentation • Large computing and storage resources • 24 hours a day availability of resources, user support • Workload Management Service • Information and Monitoring Services • Data Management Services • Security • Reliability of services 25-06-2007, Monterey Bay

  11. Storage Element Computing Element Computing Element Simplified grid workflow • FlexX license server : • 6000 floating licenses offered by BioSolveIT to SCAI • Maximum number of concurrent used licenses was 5000 Results Compounds list Site1 Parameter settings Target structures Compounds sub lists Statistics Resource Broker User interface Site2 Compounds database Storage Element Software Results 25-06-2007, Monterey Bay

  12. Schema of the current WISDOM production environment User Interface User Interface CEs &WNs SEs Submits the jobs CEs &WNs SEs D M S / G F T P WMS WMS WISDOM production system FlexX job FlexX Checks job status Resubmits Statistics Structure file FLEXlm FlexLM Compounds file Statistics license license Output file Local server HealthGrid Server Web Site inputs Web Site WISDOM DB outputs 25-06-2007, Monterey Bay

  13. Grid infrastructures and projects contributing to WISDOM-II EMBRACE BioinfoGrid SHARE EGEE Auvergrid EUMedGrid EUChinaGrid TWGrid EELA : European grid project : European grid infrastructure : Regional/national grid infrastructure 25-06-2007, Monterey Bay

  14. Instances on different infrastructures Instances deployed on the different infrastructures during the WISDOM-II data challenge 25-06-2007, Monterey Bay

  15. Deployment on different infrastrucures • Up to 5000 computers in more than17 countries mobilized from october 2006 – Jan 2007 to provide CPU • 1.738 TB of data produced Distribution of jobs 25-06-2007, Monterey Bay

  16. Statistics of deployment • First DC: • 80 CPU years • 1 TB • 1700 CPUs used in parallel • July 1st - August 15th 2005 • 2nd DC • 100 CPU years • 800 GB • 1700 CPUs used used in parallel • May 1st -April 15th 2006 • 3rd DC • 413 CPU years • 1.7 TB • Up to 5000 CPUs in parallel • 1st October 2006 - 31 January 2007 The crunching factor is the ratio of the total CPU time over the duration of the experiment. It represents the average number of CPUs used simultaneously all along the data challenge and is a metric of the parallelization gain. 25-06-2007, Monterey Bay

  17. Biological results The repartition of docking energies of the ZINC database against GST A structure. (The red column represents a score of -24kj/Mol, the docking score of a co-crystallized ligand (GTX) of GST A chain) 25-06-2007, Monterey Bay

  18. Issues • Scheduling efficiency of the grid is still a major issue • The resource broker is still the main bottleneck • This deployment also shows that it is not possible to do a naive blacklisting of the failing resources, for the simple fact that virtually all the grid resources have produced aborted jobs, and this blacklisting should also take care of the site scheduled downtimes. • Store and treat the data in a relational database 25-06-2007, Monterey Bay

  19. Interactive Web Portal • User Friendly Interface for biologists • Real Time output of the results • 3D views of the docking poses and structures • Resubmission of docking jobs 25-06-2007, Monterey Bay

  20. Conclusion • Take advantage of the EGEE services, APIs and resources. • Demonstrated the relevance of computational grids in life science applications • Manual intervention is reduced (automatic resubmission of jobs) • Use of AMGA to store results and statistics immediately. • Interoperable Web Service Interface WSDL following the WS-I profile • Improved flexibility to deploy other bioinformatics applications. 25-06-2007, Monterey Bay

  21. The next steps • To address the issue of resource brokers, we are trying to submit the jobs by bypassing resource brokers • Docking step still requires a lot of manual intervention • Task: improve output data collection and post-docking analysis • The next step after docking is Molecular Dynamics • Task: deploy Molecular Dynamics computations on grid infrastructures (successfully deployed already on one target, plasmepsin) • Contribution from CNRS-IN2P3, within the framework of BioinfoGRID, Fraunhofer SCAI and University of Modena • Beyond virtual screening, the long term vision: building a grid for malaria • To provide services to research labs working on malaria • To collect and analyze epidemiological data 25-06-2007, Monterey Bay

  22. Long term vision: a grid for malaria LPC Clermont-Ferrand: Biomedical grid SCAI Fraunhofer: Knowledge extraction, Chemoinformatics CEA, Acamba project: Biological targets, Chemogenomics Univ. Modena: Biological targets, Molecular Dynamics HealthGrid: Biomedical grid, Dissemination ITB CNR: Bioinformatics, Molecular modelling Academica Sinica: Grid user interface Univ. Los Andes: Biological targets, Malaria biology Univ. Pretoria: Bioinformatics, Malaria biology Use the grid technology to foster research and development on malaria and other neglected diseases Contacts also established with WHO, Microsoft, TATRC, Argonne, SDSC, SERONO, NOVARTIS, Sanofi-Aventis, Hospitals in subsaharian Africa, 25-06-2007, Monterey Bay

  23. Acknowledments Auvergrid Accamba BioInfoGRID EGEE EMBRACE EUChinaGRID EUMedGRID SHARE TWGrid Conseil Regional d’Auvergne European Union Academia Sinica BioSolveIT CNR-ITB CNRS CEA Healthgrid IN2P3 LPC SCAI Fraunhofer Università di Modena e Reggio Emilia Université Blaise Pascal University of Pretoria University of Los Andes wisdom.healthgrid.org 25-06-2007, Monterey Bay

More Related