180 likes | 296 Views
Life sciences applications on the EGEE Grid. Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems www.lpds.sztaki.hu sipos@sztaki.hu. The EGEE Project. Aim of EGEE:
E N D
Life sciences applicationson the EGEE Grid Gergely Sipos MTA SZTAKILaboratory of Parallel and Distributed Systems www.lpds.sztaki.hu sipos@sztaki.hu
The EGEE Project Aim of EGEE: “to establish a seamless European Grid infrastructure for the support of the European Research Area (ERA)” EGEE 1 April 2004 – 31 March 2006 71 partners in 27 countries, federated in regional Grids EGEE-II 1 April 2006 – 30 April 2008 Expanded consortium EGEE-III 1 May 2008 – 30 April 2010 Transition to sustainable model 2
Life sciences cluster in EGEE Life sciences is one of the strategic communities for EGEE • Life sciences cluster in EGEE: • To increase the impact of EGEE on this community • To drive the development of the EGEE services • To develop domain specific, high level services • Main topics: • Drug discovery • Medical imaging • Bioinformatics
Enabling Grids for E-sciencE Biomed Virtual Organization Size of the infrastructure today: • > 250 sites in 48 countries • > 68 000 CPU cores • ~ 20 PB disk + tape MSS • > 150 000 jobs/day • > 9000 registered users Out of which, Biomed VO: • > 100 sites in 30 countries • ~ 17 000 CPU • > 150 registered users 4
Enabling Grids for E-sciencE Resources Resources Communication layer (GEANT, Internet...) Resources Resources Resources Domain-specific services Domain-specific services Applications Applications Applications Applications EGEE middleware services Life sciences applications Applications level Production grid infrastructure level 6
Enabling Grids for E-sciencE Resources Communication layer (GEANT, Internet...) Resources Resources Resources Resources AMGA metadata catalogDIANE grid job scheduler GAP user interface module WISDOM Biomed Virtual Organization, EGEE middleware services Application example 1: WISDOM Applications level Production grid infrastructure level 7
WISDOM In silico Drug Discovery • WISDOM: http://wisdom.healthgrid.org/ • Goal: find new drugs for neglected and emerging diseases • Neglected diseases lack R&D • Emerging diseases require very rapid response time • Need for an optimized environment • To achieve production in a limited time • To optimize performances • Method: grid-enabled virtual docking • Cheaper than in vitro tests • Faster than in vitro tests
Enabling Grids for E-sciencE High throughput virtual docking Chemical compounds : ZINC Molecular docking : FlexX, Autodock Targets structures : PDB Grid infrastructure : EGEE Millions of chemical compounds available in laboratories Chemical compounds : Chembridge – 500,000 Drug like – 500,000 High Throughput Screening 1-10$/compound, nearly impossible Molecular docking (FlexX, Autodock) ~80 CPU years, 1 TB data Computational data challenge ~6 weeks on ~1000/1600 computers Targets : Plasmepsin II (1lee, 1lf2, 1lf3) Plasmepsin IV (1ls5) (enzymes) Hits screening using assays performed on living cells Leads Clinical testing Drug
Computing model & workflow Simulationjobs run on theEGEE Grid Simulationresults storedon the EGEE Grid
Efficiency • Second data challenge for avian flu drug analysis • 8 targets against 300,000 compounds(2,400,000 simulations)
Statistics of deployment • First Data Challenge: July 1st - August 15th 2005 • Target: malaria • 80 CPU years • 1 TB of data produced • 1700 CPUs used in parallel • 1st large scale docking on world-wide e-infrastructure • Second Data Challenge: April 15th - June 30th 2006 • Target: avian flu • 100 CPU years • 800 GB of data produced • 1700 CPUs used in parallel • Infrastructure was configured in 45 days • Third Data Challenge: October 1st - 15th December 2006 • Target: malaria • 400 CPU years • 1,6 TB of data produced • Up to 5000 CPUs used in parallel • Very high docking throughput: > 100.000 compounds per hour
Enabling Grids for E-sciencE Resources Communication layer (GEANT, Internet...) Resources Resources Resources Resources MOTEUR workflow manager Bronze standard workflow Biomed Virtual Organization, EGEE middleware services Application example 2: Bronze standard Applications level Production grid infrastructure level 14
Scientific challenge • Medical image registration is the process by which two images acquired independently are registered into a common frame. T O1 O2 Unregistered Registered Registration accuracy is critical for many image analysis procedures Bronze Standard is a statistical procedure to estimate the performance of registration algorithms
FormatConv FormatConv GetFromEGEE GetFromEGEE FormatConv FormatConv CrestLines Service PFRegister Enabling Grids for E-sciencE GetFromEGEE WriteResults WriteResults GetFromEGEE WriteResults WriteResults PFMatchICP Yasmina Baladin MultiTransfoTest Implementation on EGEE A B ~100 image pairs ~800 EGEE jobs Params Params Params Params Params Params Params Params Params MethodToTest Accuracy Translation Accuracy Rotation
Enabling Grids for E-sciencE Resources Communication layer (GEANT, Internet...) Resources Resources Resources Resources Bioinformatics Grid Portal Biomed Virtual Organization, EGEE middleware services Application example 3: Bioinformatics Grid Portal Applications level Production grid infrastructure level 17
Enabling Grids for E-sciencE GPSA: Bioinformatics Grid Portal • Scientific objectives • Protein sequence analysis • Analyse data from high-throughput Biology: genome projects, structural biology, …. • Tools • Web interface: NPS@ • Protein databases are stored on grid storage as flat files • SWISS-PROT, SP-TrEMBL, NRL_3D, PATTINPROT, … • Legacy bioinformatics applications • FASTA, BLAST, PSI-BLAST, SSEARCH, … • Contact • http://npsa-pbil.ibcp.fr/ • Christophe.Blanchet@ibcp.fr 18
How to get involved with EGEE • More information on EGEE: • http://www.eu-egee.org • Life Sciences cluster: http://technical.eu-egee.org/index.php?id=258 • Coordinator of life sciences cluster: • Vincent BRETON (breton@clermont.in2p3.fr) • To get your own application ported to EGEE: • Support team: http://www.lpds.sztaki.hu/gasuc • To get access to Biomed Virtual Organization • Obtain a certificate from NIIF CA: http://www.ca.niif.hu/ • Register to Virtual Organization: https://voms.cnaf.infn.it:8443/voms/bio/webui/request/user/create • Access grid from P-GRADE Portal, Bioinformatics Grid Portal, etc. • EGEE User Forum, Catania, Italy, 2-6 March, 2009: • http://indico.cern.ch/conferenceDisplay.py?confId=40435
www.eu-egee.org www.lpds.sztaki.hu Gergely Sipos sipos@sztaki.hu 21