1 / 18

Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

Life sciences applications on the EGEE Grid. Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems www.lpds.sztaki.hu sipos@sztaki.hu. The EGEE Project. Aim of EGEE:

tymon
Download Presentation

Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Life sciences applicationson the EGEE Grid Gergely Sipos MTA SZTAKILaboratory of Parallel and Distributed Systems www.lpds.sztaki.hu sipos@sztaki.hu

  2. The EGEE Project Aim of EGEE: “to establish a seamless European Grid infrastructure for the support of the European Research Area (ERA)” EGEE 1 April 2004 – 31 March 2006 71 partners in 27 countries, federated in regional Grids EGEE-II 1 April 2006 – 30 April 2008 Expanded consortium EGEE-III 1 May 2008 – 30 April 2010 Transition to sustainable model 2

  3. Life sciences cluster in EGEE Life sciences is one of the strategic communities for EGEE • Life sciences cluster in EGEE: • To increase the impact of EGEE on this community • To drive the development of the EGEE services • To develop domain specific, high level services • Main topics: • Drug discovery • Medical imaging • Bioinformatics

  4. Enabling Grids for E-sciencE Biomed Virtual Organization Size of the infrastructure today: • > 250 sites in 48 countries • > 68 000 CPU cores • ~ 20 PB disk + tape MSS • > 150 000 jobs/day • > 9000 registered users Out of which, Biomed VO: • > 100 sites in 30 countries • ~ 17 000 CPU • > 150 registered users 4

  5. Enabling Grids for E-sciencE Resources Resources Communication layer (GEANT, Internet...) Resources Resources Resources Domain-specific services Domain-specific services Applications Applications Applications Applications EGEE middleware services Life sciences applications Applications level Production grid infrastructure level 6

  6. Enabling Grids for E-sciencE Resources Communication layer (GEANT, Internet...) Resources Resources Resources Resources AMGA metadata catalogDIANE grid job scheduler GAP user interface module WISDOM Biomed Virtual Organization, EGEE middleware services Application example 1: WISDOM Applications level Production grid infrastructure level 7

  7. WISDOM In silico Drug Discovery • WISDOM: http://wisdom.healthgrid.org/ • Goal: find new drugs for neglected and emerging diseases • Neglected diseases lack R&D • Emerging diseases require very rapid response time • Need for an optimized environment • To achieve production in a limited time • To optimize performances • Method: grid-enabled virtual docking • Cheaper than in vitro tests • Faster than in vitro tests

  8. Enabling Grids for E-sciencE High throughput virtual docking Chemical compounds : ZINC Molecular docking : FlexX, Autodock Targets structures : PDB Grid infrastructure : EGEE Millions of chemical compounds available in laboratories Chemical compounds : Chembridge – 500,000 Drug like – 500,000 High Throughput Screening 1-10$/compound, nearly impossible Molecular docking (FlexX, Autodock)‏ ~80 CPU years, 1 TB data Computational data challenge ~6 weeks on ~1000/1600 computers Targets : Plasmepsin II (1lee, 1lf2, 1lf3)‏ Plasmepsin IV (1ls5)‏ (enzymes) Hits screening using assays performed on living cells Leads Clinical testing Drug

  9. Computing model & workflow Simulationjobs run on theEGEE Grid Simulationresults storedon the EGEE Grid

  10. Efficiency • Second data challenge for avian flu drug analysis • 8 targets against 300,000 compounds(2,400,000 simulations)

  11. Statistics of deployment • First Data Challenge: July 1st - August 15th 2005 • Target: malaria • 80 CPU years • 1 TB of data produced • 1700 CPUs used in parallel • 1st large scale docking on world-wide e-infrastructure • Second Data Challenge: April 15th - June 30th 2006 • Target: avian flu • 100 CPU years • 800 GB of data produced • 1700 CPUs used in parallel • Infrastructure was configured in 45 days • Third Data Challenge: October 1st - 15th December 2006 • Target: malaria • 400 CPU years • 1,6 TB of data produced • Up to 5000 CPUs used in parallel • Very high docking throughput: > 100.000 compounds per hour

  12. Enabling Grids for E-sciencE Resources Communication layer (GEANT, Internet...) Resources Resources Resources Resources MOTEUR workflow manager Bronze standard workflow Biomed Virtual Organization, EGEE middleware services Application example 2: Bronze standard Applications level Production grid infrastructure level 14

  13. Scientific challenge • Medical image registration is the process by which two images acquired independently are registered into a common frame. T O1 O2 Unregistered Registered Registration accuracy is critical for many image analysis procedures Bronze Standard is a statistical procedure to estimate the performance of registration algorithms

  14. FormatConv FormatConv GetFromEGEE GetFromEGEE FormatConv FormatConv CrestLines Service PFRegister Enabling Grids for E-sciencE GetFromEGEE WriteResults WriteResults GetFromEGEE WriteResults WriteResults PFMatchICP Yasmina Baladin MultiTransfoTest Implementation on EGEE A B ~100 image pairs ~800 EGEE jobs Params Params Params Params Params Params Params Params Params MethodToTest Accuracy Translation Accuracy Rotation

  15. Enabling Grids for E-sciencE Resources Communication layer (GEANT, Internet...) Resources Resources Resources Resources Bioinformatics Grid Portal Biomed Virtual Organization, EGEE middleware services Application example 3: Bioinformatics Grid Portal Applications level Production grid infrastructure level 17

  16. Enabling Grids for E-sciencE GPSA: Bioinformatics Grid Portal • Scientific objectives • Protein sequence analysis • Analyse data from high-throughput Biology: genome projects, structural biology, …. • Tools • Web interface: NPS@ • Protein databases are stored on grid storage as flat files • SWISS-PROT, SP-TrEMBL, NRL_3D, PATTINPROT, … • Legacy bioinformatics applications • FASTA, BLAST, PSI-BLAST, SSEARCH, … • Contact • http://npsa-pbil.ibcp.fr/ • Christophe.Blanchet@ibcp.fr 18

  17. How to get involved with EGEE • More information on EGEE: • http://www.eu-egee.org • Life Sciences cluster: http://technical.eu-egee.org/index.php?id=258 • Coordinator of life sciences cluster: • Vincent BRETON (breton@clermont.in2p3.fr) • To get your own application ported to EGEE: • Support team: http://www.lpds.sztaki.hu/gasuc • To get access to Biomed Virtual Organization • Obtain a certificate from NIIF CA: http://www.ca.niif.hu/ • Register to Virtual Organization: https://voms.cnaf.infn.it:8443/voms/bio/webui/request/user/create • Access grid from P-GRADE Portal, Bioinformatics Grid Portal, etc. • EGEE User Forum, Catania, Italy, 2-6 March, 2009: • http://indico.cern.ch/conferenceDisplay.py?confId=40435

  18. www.eu-egee.org www.lpds.sztaki.hu Gergely Sipos sipos@sztaki.hu 21

More Related