1 / 47

User communities and applications

User communities and applications. David Fergusson 28th February. Enabling Grids for EsciencE. What is the EGEE community? Researchers in eScience (applications NA4) eResearch European community World grid community Industry (industry forum) What is not the EGEE community?.

licia
Download Presentation

User communities and applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User communities and applications David Fergusson 28th February

  2. Enabling Grids for EsciencE • What is the EGEE community? • Researchers in eScience (applications NA4) • eResearch • European community • World grid community • Industry (industry forum) • What is not the EGEE community?

  3. eScience/eResearch • EGEE’s initial focus is on specific scientific communities • High Energy Physics (Large Hadron Collider) • Biomedical • Geology • Chemistry • Astrophysics • Collaborating with other EU projects in other areas • For example, digital libraries - DILIGENT

  4. Applications in EGEE • Production service supporting multiple VOswith different requirements • Data • Volume • Location – distributed? • Write Once or Update? • Metadata archives? • Controlled or open access? • Computation • High throughput (~ current LCG) • High performance, supercomputing • No. of sites, scientists,… • Establish viable general process to bring other scientific communities on board

  5. An EGEE community • EGEE communities are based around the idea of Virtual Organisations. • A Virtual Organisation: • Owns shared computing resources • Authorises and authenticates its members access to resources • Manages its own resources

  6. EGEE: adding a VO EGEE has a formal procedure for adding selected new user communities (Virtual Organisations): • Negotiation with one of the Regional Operations Centres • Seek balance between the resources contributed by a VO and those that they consume. • Resource allocation will be made at the VO level. • Many resources need to be available to multiple VOs : shared use of resources is fundamental to a Grid

  7. The role of the pilot applications – HEP and Biomedicine • Initial area of focus to establish a strong user base on which to build a broad EGEE user community • Provide early feedback to the infrastructure activities on their experience with application deployment and VO management • Act as guinea pigs and provide early feedback to the middleware developers on their experience with new services

  8. EGEE pilot application: Large Hadron Collider • Data Challenge: • 10Petabytes/year of data !!! • 20 million CDs each year! • Simulation, reconstruction, analysis: • LHC data handling requires computing power equivalent to ~100,000 of today's fastest PC processors! • Operational challenges • Reliable and scalable through project lifetime of decades Mont Blanc (4810 m) Downtown Geneva

  9. The characteristics of pilot HEP applications • Very large scale from project day 1 • Virtual Organizations were already set up at project day 1 • Very centralized: jobs are sent in a very organized way • Multi-grid: data challenges are deployed on several grids • ALICE LCG, Alien • ATLAS LCG, US Grid2003, Nordugrid • CMS LCG, US Grid2003 • LHCb LCG, Dirac

  10. http://www.cern.ch LHC ~9 km SPS CERN The Large Hadron Collider

  11. The LHC Experiments

  12. Overview of experiences with LHC data challenges • There was continual evolution throughout 2004, with LCG and experiments gaining more experience in the development and use of an expanding LCG grid • All experiments had excellent relations with LCG-EIS support – a model for the future support of VOs • Global job efficiencies ranged from 60-80% as experience developed – must get up to 90+% for user analysis - look to new middleware developments and tighter operational procedures • Sources of problems and losses • Site configuration, management and stability • Data Management (especially metadata handling) • Difficult to monitor job running and causes of failure • D0 in early 2005 showed that one can run with good efficiency with a set of well controlled sites

  13. EGEE pilot application: BioMedical • BioMedical • Bioinformatics (gene/proteome databases distributions) • Medical applications (screening, epidemiology, image databases distribution, etc.) • Interactive application (human supervision or simulation) • Security/privacy constraints • Heterogeneous data formats - Frequent data updates - Complex data sets - Long term archiving • http://egee-na4.ct.infn.it/biomed/applications.html

  14. The characteristics of biomedical pilot applications • Prototype level at project day 1 • VO was created after the project kicked-off • Very decentralized: application developers use the grid at their own pace • Very demanding on services • Compute intensive applications • Applications requiring large amounts of short jobs • Need for interactivity or guaranteed response time • Resources were focused on the deployment of large scale applications on LCG-2 • Integration of Biomed VO used to identify issues relevant to all VOs to be deployed during EGEE lifetime • Decentralized usage of the infrastructure highlights different weaknesses from the more centralized HEP data challenges

  15. RLS, VO LDAP Server: CC-IN2P3 4 RBs: CNAF, IFAE, LAPP, UPV • 15 resource centres ( ) • 17 CEs (>750 CPUs) • 16 SEs 4 RBs 1 RLS 1 LDAP Server Status of Biomedical VO PADOVA BARI

  16. Biomedical VO: production jobs on EGEE

  17. Biomedical applications • 3 batch-oriented applications ported on LCG2 • SiMRI3D: medical image simulation • xmipp_MLRefine: molecular structure analysis • GATE: radiotherapy planning • 3 high throughput applications ported on LCG2 • CDSS: clinical decision support system • GPS@: bioinformatics portal (multiple short jobs) • gPTM3D: radiology images analysis (interactivity) • New applications to join in the near future • Especially in the field of drug discovery

  18. EGEE pilot application: BioMedical • BioMedical • Bioinformatics (gene/proteome databases distributions) • Medical applications (screening, epidemiology, image databases distribution, etc.) • Interactive application (human supervision or simulation) • Security/privacy constraints • Heterogeneous data formats - Frequent data updates - Complex data sets - Long term archiving • BioMed applications deployed • GATE - Geant4 Application for Tomographic Emission • GPS@ - genomic web portal • CDSS - Clinical Decision Support System

  19. 12 Biomed applications • GATE: Geant4 Application for Tomographic Emission (LPC) • Docking platform for tropical diseases: grid-enabled docking platform for in sillico drug discovery (LPC) • CDSS: Clinical Decision Support System (UPV) • GPS@: Grid genomic web portal (IBCP) • SiMRI 3D: Magnetic Resonance Image simulator (CREATIS) • gPTM 3D: Interactive radiological image visualization and processing tool (LRI) • xmipp_ML_refine: Macromolecular 3D structure analysis (CNB) • xmipp_multiple_CTFs : Electronmicroscopic images CTF calculation (CNB) • GridGRAMM: Molecular Docking web (CNB) • GROCK: Mass screenings of molecular interaction (CNB • Mammogrid: Mammograms analysis (EU project) • SPLATCHE: Genome evolution modeling (U. Berne/WHO)

  20. ...and more to come • SPLATCHE • first application being migrated from GILDA to biomed VO • Pharmacokinetics in MRI (UPV) • MRI registration for contrast agent diffusion study • Some progress on biological sequences analysis (M. Lexa) • ...

  21. BLAST – comparing DNA or protein sequences • BLAST is the first step for analysing new sequences: to compare DNA or protein sequences to other ones stored in personal or public databases. Ideal as a grid application. • Requires resources to store databases and run algorithms • Can compare one or several sequence against a database in parallel • Large user community

  22. 1. Query the medical image database and retrieve a patient image Exam image patient key ACL ... Medical images Metadata 2. Compute similarity measures over the database images Submit 1 job per image 3. Retrieve most similar cases Applications deployed Applications tested Applications under preparation Similar images Low score images Bio-medicine applications • Bio-informatics • Phylogenetics • Search for primers • Statistical genetics • Bio-informatics web portal • Parasitology • Data-mining on DNA chips • Geometrical protein comparison • Medical imaging • MR image simulation • Medical data and metadata management • Mammographies analysis • Simulation platform for PET/SPECT

  23. Bio-medicine applications

  24. Bio-medicine applications

  25. Bio-medicine applications

  26. gPTM3D : Grid-Enabling Interactive Medical Analysis Interaction Acquire Explore Analyse Interpret Render

  27. Use case Planning percutaneous nephrolithotomy

  28. Evolution of biomedical applications • Growing interest of the biomedical community • Partners involved proposing new applications • New application proposals (in various health-related areas) • Enlargement of the biomedical community (drug discovery) • Growing scale of the applications • Progressive migration from prototypes to pre-production services for some applications • Increase in scale (volume of data and number of CPU hours) • Towards pre-production • Several initiatives to build user-friendly portals and interfaces to existing applications in order to open to an end-users community

  29. A look at the future: the HealthGrid vision In this context "Health" does not involve only clinical practice but covers the whole range of information from molecular level (genetic and proteomic information) over cells and tissues, to the individual and finally the population level (social healthcare). HealthGRID Patient related data PublicHealth Association Modelling Computation Databases Public Health Patient Patient Tissue, organ Tissue, organ Cell Cell Molecule Molecule Computational recommendation INDIVIDUALISED HEALTHCARE MOLECULAR MEDICINE

  30. Earth Sciences in EGEE • Research • Earth observations by satellite • (ESA(IT), KNMI(NL), IPSL(FR), UTV(IT), RIVM(NL),SRON(NL)) • Climate : • DKRZ(GE),IPSL(FR) • Solid Earth Physics: • IPGP (FR) • Hydrology: • Neuchâtel University (CH) • Industry • CGG : Geophysics Company (FR)

  31. Climate Applications in EGEE • Model: Atmosphere, Ocean, Hydrology, Atmospheric and Marine chemistry…. • Goal:Comparison of model outputs from different runs and/or institutes • Large volume of data (TB) from different model outputs, and experimental data • Run made on supercomputer  => Link the EGEE infrastruture with supercomputer Grids (DEISA) EXAMPLE: For the IPCC Assessment reports many experiment are performed with different models (different spatial resolution, differenttime-step,different "physics" ..) and varioussites. The generated data need to be comparedin a comprehensive and "unified" way.

  32. Geophysics Applications Seismic processing Generic Platform: - Based on Geocluster, an industrial application – to be a starter of the core member VO. - Include several standard tools for signal processing, simulation and inversion. • - Opened: any user can write new algorithms in new modules (shared or not) • - Free for academic research • Controlled by license keys (opportunity to explore license issue at a grid level) • initial partners F, CH, UK, Russia, Norway

  33. Sample Vah river Geographical Information Systems Results: flow + water depths Computer vision Flood simulation

  34. SURFACE Construction of the Potential Energy Surface PROPERTIES DYNAMICS Dynamical properties Calculation Calculation of Averaged quantities Good Results? no yes end Computational Chemistry: molecular simulator Ar - Benzene

  35. The MAGIC telescope • Largest Imaging Air Cherenkov Telescope(17 m mirror dish) • Located on Canary Island La Palma (@ 2200 m asl) • Lowestenergy threshold ever obtained with a Cherenkov telescope • Aim: detect –ray sources in the unexplored energy range: 30 (10)-> 300 GeV

  36. The MAGIC Physics Program • Cosmological g-Ray Horizon • AGNs • Pulsars • Origin of Cosmic Rays • Tests of Quantum Gravity effects • SNRs • GRBs • Cold Dark Matter

  37. Feedback to LCG-2 middleware developers and infrastructure • From HEP applications • Experiment Integration Support group and Grid Applications Group produced documents summarizing problems encountered in use of LCG-2 • From Biomed applications • Very significant exchanges related to the set-up of the biomed VO and the deployment of relevant services • Request to use MPI

  38. Engineering applications

  39. Engineering applications

  40. Museo Virtual de Artes El Pais (MUVA) http://www3.diarioelpais.com/muva/. Books are being scanned in at 767 MB per page1/2 Terabyte for Gutenberg Bible Paintings are being scanned in at 30 GB eachin the EU CRISATEL Project Grid Applications: art

  41. Who else can benefit from EGEE? • EGEE Generic Applications Advisory Panel: • For new applications • EU projects: MammoGrid, Diligent, SEE-GRID … • Expression of interest: Planck/Gaia (astroparticle), SimDat (drug discovery) http://agenda.cern.ch/age?a042351 Next meeting at EGEE conference (November)

  42. New communities identification • Through training, dissemination and outreach, communities already using advanced computing and keen to use EGEE infrastructure are identified • These communities are encouraged to prepare a document describing their interest to use EGEE • A scientific advisory panel (EGAAP) assesses and chooses among the interested communities the ones which seem the most mature to deploy their applications on EGEE

  43. GILDA, an infrastructure for dissemination and demonstration • Goals • Demonstration of grid operation for tutorials and outreach • Initial deployment of new applications for testing purposes • Key features • Initiative of the INFN Grid Project using LCG-2 middleware • On request, anyone can quickly receive a grid certificate and a VO membership allowing them to use the infrastructure for 2 weeks • Certificate expires after two weeks but can be renewed • Use of friendly interface: Genius grid portal • Very important for the first steps of new user communities on to the grid infrastructure

  44. GILDA numbers • 14 sites in 2 continents • >1200 certificates issued, 10% renewed at least once • >35 tutorials and demos performed in 10 months • >25 jobs/day on the average • Job success rate above 96% • >320,000 hits on the web site from 10’s of different countries • >200 copies of the UI live CD distributed in the world

  45. NA4 Applications and GILDA • 7 Virtual Organizations supported: • Biomed • Earth Science Academy (ESR) • Earth Science Industry (CGG) • Astroparticle Physics (MAGIC) • Computational Chemistry (GEMS) • Grid Search Engines (GRACE) • Astrophysics (PLANCK) • Development of complete interfaces with GENIUS for 3 Biomed Applications: GATE, hadronTherapy, and Friction/Arlecore • Development of complete interfaces with GENIUS for 4 Generic Applications: EGEODE (CGG), MAGIC, GEMS, and CODESA-3D (ESR) (see demos!) • Development of complete interfaces with GENIUS for 16 demonstrative applications available on the GILDA Grid Demonstrator (https://grid-demo.ct.infn.it)

  46. Summary • EGEE and grids – not just physics • For communities to benefit they need to know what grids can do for them – dissemination • Many communities are beginning to adopt the grid • EGEE has a mechanism for assisting communities onto the grid

  47. Practical URLs • homepages.nesc.ac.uk/~gcw • grid-demo.ct.infn.it

More Related