470 likes | 480 Views
Learn about the EGEE community, its focus on specific scientific communities, collaboration with other EU projects, and the role of pilot applications in high energy physics and biomedical research.
E N D
User communities and applications David Fergusson 28th February
Enabling Grids for EsciencE • What is the EGEE community? • Researchers in eScience (applications NA4) • eResearch • European community • World grid community • Industry (industry forum) • What is not the EGEE community?
eScience/eResearch • EGEE’s initial focus is on specific scientific communities • High Energy Physics (Large Hadron Collider) • Biomedical • Geology • Chemistry • Astrophysics • Collaborating with other EU projects in other areas • For example, digital libraries - DILIGENT
Applications in EGEE • Production service supporting multiple VOswith different requirements • Data • Volume • Location – distributed? • Write Once or Update? • Metadata archives? • Controlled or open access? • Computation • High throughput (~ current LCG) • High performance, supercomputing • No. of sites, scientists,… • Establish viable general process to bring other scientific communities on board
An EGEE community • EGEE communities are based around the idea of Virtual Organisations. • A Virtual Organisation: • Owns shared computing resources • Authorises and authenticates its members access to resources • Manages its own resources
EGEE: adding a VO EGEE has a formal procedure for adding selected new user communities (Virtual Organisations): • Negotiation with one of the Regional Operations Centres • Seek balance between the resources contributed by a VO and those that they consume. • Resource allocation will be made at the VO level. • Many resources need to be available to multiple VOs : shared use of resources is fundamental to a Grid
The role of the pilot applications – HEP and Biomedicine • Initial area of focus to establish a strong user base on which to build a broad EGEE user community • Provide early feedback to the infrastructure activities on their experience with application deployment and VO management • Act as guinea pigs and provide early feedback to the middleware developers on their experience with new services
EGEE pilot application: Large Hadron Collider • Data Challenge: • 10Petabytes/year of data !!! • 20 million CDs each year! • Simulation, reconstruction, analysis: • LHC data handling requires computing power equivalent to ~100,000 of today's fastest PC processors! • Operational challenges • Reliable and scalable through project lifetime of decades Mont Blanc (4810 m) Downtown Geneva
The characteristics of pilot HEP applications • Very large scale from project day 1 • Virtual Organizations were already set up at project day 1 • Very centralized: jobs are sent in a very organized way • Multi-grid: data challenges are deployed on several grids • ALICE LCG, Alien • ATLAS LCG, US Grid2003, Nordugrid • CMS LCG, US Grid2003 • LHCb LCG, Dirac
http://www.cern.ch LHC ~9 km SPS CERN The Large Hadron Collider
Overview of experiences with LHC data challenges • There was continual evolution throughout 2004, with LCG and experiments gaining more experience in the development and use of an expanding LCG grid • All experiments had excellent relations with LCG-EIS support – a model for the future support of VOs • Global job efficiencies ranged from 60-80% as experience developed – must get up to 90+% for user analysis - look to new middleware developments and tighter operational procedures • Sources of problems and losses • Site configuration, management and stability • Data Management (especially metadata handling) • Difficult to monitor job running and causes of failure • D0 in early 2005 showed that one can run with good efficiency with a set of well controlled sites
EGEE pilot application: BioMedical • BioMedical • Bioinformatics (gene/proteome databases distributions) • Medical applications (screening, epidemiology, image databases distribution, etc.) • Interactive application (human supervision or simulation) • Security/privacy constraints • Heterogeneous data formats - Frequent data updates - Complex data sets - Long term archiving • http://egee-na4.ct.infn.it/biomed/applications.html
The characteristics of biomedical pilot applications • Prototype level at project day 1 • VO was created after the project kicked-off • Very decentralized: application developers use the grid at their own pace • Very demanding on services • Compute intensive applications • Applications requiring large amounts of short jobs • Need for interactivity or guaranteed response time • Resources were focused on the deployment of large scale applications on LCG-2 • Integration of Biomed VO used to identify issues relevant to all VOs to be deployed during EGEE lifetime • Decentralized usage of the infrastructure highlights different weaknesses from the more centralized HEP data challenges
RLS, VO LDAP Server: CC-IN2P3 4 RBs: CNAF, IFAE, LAPP, UPV • 15 resource centres ( ) • 17 CEs (>750 CPUs) • 16 SEs 4 RBs 1 RLS 1 LDAP Server Status of Biomedical VO PADOVA BARI
Biomedical applications • 3 batch-oriented applications ported on LCG2 • SiMRI3D: medical image simulation • xmipp_MLRefine: molecular structure analysis • GATE: radiotherapy planning • 3 high throughput applications ported on LCG2 • CDSS: clinical decision support system • GPS@: bioinformatics portal (multiple short jobs) • gPTM3D: radiology images analysis (interactivity) • New applications to join in the near future • Especially in the field of drug discovery
EGEE pilot application: BioMedical • BioMedical • Bioinformatics (gene/proteome databases distributions) • Medical applications (screening, epidemiology, image databases distribution, etc.) • Interactive application (human supervision or simulation) • Security/privacy constraints • Heterogeneous data formats - Frequent data updates - Complex data sets - Long term archiving • BioMed applications deployed • GATE - Geant4 Application for Tomographic Emission • GPS@ - genomic web portal • CDSS - Clinical Decision Support System
12 Biomed applications • GATE: Geant4 Application for Tomographic Emission (LPC) • Docking platform for tropical diseases: grid-enabled docking platform for in sillico drug discovery (LPC) • CDSS: Clinical Decision Support System (UPV) • GPS@: Grid genomic web portal (IBCP) • SiMRI 3D: Magnetic Resonance Image simulator (CREATIS) • gPTM 3D: Interactive radiological image visualization and processing tool (LRI) • xmipp_ML_refine: Macromolecular 3D structure analysis (CNB) • xmipp_multiple_CTFs : Electronmicroscopic images CTF calculation (CNB) • GridGRAMM: Molecular Docking web (CNB) • GROCK: Mass screenings of molecular interaction (CNB • Mammogrid: Mammograms analysis (EU project) • SPLATCHE: Genome evolution modeling (U. Berne/WHO)
...and more to come • SPLATCHE • first application being migrated from GILDA to biomed VO • Pharmacokinetics in MRI (UPV) • MRI registration for contrast agent diffusion study • Some progress on biological sequences analysis (M. Lexa) • ...
BLAST – comparing DNA or protein sequences • BLAST is the first step for analysing new sequences: to compare DNA or protein sequences to other ones stored in personal or public databases. Ideal as a grid application. • Requires resources to store databases and run algorithms • Can compare one or several sequence against a database in parallel • Large user community
1. Query the medical image database and retrieve a patient image Exam image patient key ACL ... Medical images Metadata 2. Compute similarity measures over the database images Submit 1 job per image 3. Retrieve most similar cases Applications deployed Applications tested Applications under preparation Similar images Low score images Bio-medicine applications • Bio-informatics • Phylogenetics • Search for primers • Statistical genetics • Bio-informatics web portal • Parasitology • Data-mining on DNA chips • Geometrical protein comparison • Medical imaging • MR image simulation • Medical data and metadata management • Mammographies analysis • Simulation platform for PET/SPECT
gPTM3D : Grid-Enabling Interactive Medical Analysis Interaction Acquire Explore Analyse Interpret Render
Use case Planning percutaneous nephrolithotomy
Evolution of biomedical applications • Growing interest of the biomedical community • Partners involved proposing new applications • New application proposals (in various health-related areas) • Enlargement of the biomedical community (drug discovery) • Growing scale of the applications • Progressive migration from prototypes to pre-production services for some applications • Increase in scale (volume of data and number of CPU hours) • Towards pre-production • Several initiatives to build user-friendly portals and interfaces to existing applications in order to open to an end-users community
A look at the future: the HealthGrid vision In this context "Health" does not involve only clinical practice but covers the whole range of information from molecular level (genetic and proteomic information) over cells and tissues, to the individual and finally the population level (social healthcare). HealthGRID Patient related data PublicHealth Association Modelling Computation Databases Public Health Patient Patient Tissue, organ Tissue, organ Cell Cell Molecule Molecule Computational recommendation INDIVIDUALISED HEALTHCARE MOLECULAR MEDICINE
Earth Sciences in EGEE • Research • Earth observations by satellite • (ESA(IT), KNMI(NL), IPSL(FR), UTV(IT), RIVM(NL),SRON(NL)) • Climate : • DKRZ(GE),IPSL(FR) • Solid Earth Physics: • IPGP (FR) • Hydrology: • Neuchâtel University (CH) • Industry • CGG : Geophysics Company (FR)
Climate Applications in EGEE • Model: Atmosphere, Ocean, Hydrology, Atmospheric and Marine chemistry…. • Goal:Comparison of model outputs from different runs and/or institutes • Large volume of data (TB) from different model outputs, and experimental data • Run made on supercomputer => Link the EGEE infrastruture with supercomputer Grids (DEISA) EXAMPLE: For the IPCC Assessment reports many experiment are performed with different models (different spatial resolution, differenttime-step,different "physics" ..) and varioussites. The generated data need to be comparedin a comprehensive and "unified" way.
Geophysics Applications Seismic processing Generic Platform: - Based on Geocluster, an industrial application – to be a starter of the core member VO. - Include several standard tools for signal processing, simulation and inversion. • - Opened: any user can write new algorithms in new modules (shared or not) • - Free for academic research • Controlled by license keys (opportunity to explore license issue at a grid level) • initial partners F, CH, UK, Russia, Norway
Sample Vah river Geographical Information Systems Results: flow + water depths Computer vision Flood simulation
SURFACE Construction of the Potential Energy Surface PROPERTIES DYNAMICS Dynamical properties Calculation Calculation of Averaged quantities Good Results? no yes end Computational Chemistry: molecular simulator Ar - Benzene
The MAGIC telescope • Largest Imaging Air Cherenkov Telescope(17 m mirror dish) • Located on Canary Island La Palma (@ 2200 m asl) • Lowestenergy threshold ever obtained with a Cherenkov telescope • Aim: detect –ray sources in the unexplored energy range: 30 (10)-> 300 GeV
The MAGIC Physics Program • Cosmological g-Ray Horizon • AGNs • Pulsars • Origin of Cosmic Rays • Tests of Quantum Gravity effects • SNRs • GRBs • Cold Dark Matter
Feedback to LCG-2 middleware developers and infrastructure • From HEP applications • Experiment Integration Support group and Grid Applications Group produced documents summarizing problems encountered in use of LCG-2 • From Biomed applications • Very significant exchanges related to the set-up of the biomed VO and the deployment of relevant services • Request to use MPI
Museo Virtual de Artes El Pais (MUVA) http://www3.diarioelpais.com/muva/. Books are being scanned in at 767 MB per page1/2 Terabyte for Gutenberg Bible Paintings are being scanned in at 30 GB eachin the EU CRISATEL Project Grid Applications: art
Who else can benefit from EGEE? • EGEE Generic Applications Advisory Panel: • For new applications • EU projects: MammoGrid, Diligent, SEE-GRID … • Expression of interest: Planck/Gaia (astroparticle), SimDat (drug discovery) http://agenda.cern.ch/age?a042351 Next meeting at EGEE conference (November)
New communities identification • Through training, dissemination and outreach, communities already using advanced computing and keen to use EGEE infrastructure are identified • These communities are encouraged to prepare a document describing their interest to use EGEE • A scientific advisory panel (EGAAP) assesses and chooses among the interested communities the ones which seem the most mature to deploy their applications on EGEE
GILDA, an infrastructure for dissemination and demonstration • Goals • Demonstration of grid operation for tutorials and outreach • Initial deployment of new applications for testing purposes • Key features • Initiative of the INFN Grid Project using LCG-2 middleware • On request, anyone can quickly receive a grid certificate and a VO membership allowing them to use the infrastructure for 2 weeks • Certificate expires after two weeks but can be renewed • Use of friendly interface: Genius grid portal • Very important for the first steps of new user communities on to the grid infrastructure
GILDA numbers • 14 sites in 2 continents • >1200 certificates issued, 10% renewed at least once • >35 tutorials and demos performed in 10 months • >25 jobs/day on the average • Job success rate above 96% • >320,000 hits on the web site from 10’s of different countries • >200 copies of the UI live CD distributed in the world
NA4 Applications and GILDA • 7 Virtual Organizations supported: • Biomed • Earth Science Academy (ESR) • Earth Science Industry (CGG) • Astroparticle Physics (MAGIC) • Computational Chemistry (GEMS) • Grid Search Engines (GRACE) • Astrophysics (PLANCK) • Development of complete interfaces with GENIUS for 3 Biomed Applications: GATE, hadronTherapy, and Friction/Arlecore • Development of complete interfaces with GENIUS for 4 Generic Applications: EGEODE (CGG), MAGIC, GEMS, and CODESA-3D (ESR) (see demos!) • Development of complete interfaces with GENIUS for 16 demonstrative applications available on the GILDA Grid Demonstrator (https://grid-demo.ct.infn.it)
Summary • EGEE and grids – not just physics • For communities to benefit they need to know what grids can do for them – dissemination • Many communities are beginning to adopt the grid • EGEE has a mechanism for assisting communities onto the grid
Practical URLs • homepages.nesc.ac.uk/~gcw • grid-demo.ct.infn.it