290 likes | 299 Views
EDG Application. The European DataGrid Project Team http://www.eu-datagrid.org. EDG Application Areas. High Energy Physics. Earth Observation Science Applications. Biomedical Applications. High Energy Physics. CMS. 4 Experiments on LHC. ATLAS. ~6-8 PetaBytes / year
E N D
EDG Application The European DataGrid Project Team http://www.eu-datagrid.org
EDG Application Areas High Energy Physics Earth Observation Science Applications Biomedical Applications
High Energy Physics CMS 4 Experiments on LHC ATLAS ~6-8 PetaBytes / year ~108 events/year ~103 batch and interactive users LHCb
CERN’s Network in the World Europe: 267 institutes, 4603 usersElsewhere: 208 institutes, 1632 users
CMS jobs description CMKIN Job CMSIM Job Write to Grid Storage Element Read from Grid Storage Element CMKIN : MC Generation of the proton-proton interaction for a physics channel (dataset) CMSIM: Detailed simulation of the CMS detector, processing the data produced during the CMKIN step Write to Grid Storage Element Output data Output data Grid Storage * PIII 1GHz 512MB 46.8 SI95
CMS EDG CE CE CE parameters CMS software CMS software CMS software JDL Push data or info Pull info CMS production components interfaced to EDG middleware • Production is managed from the EDG User Interface with IMPALA/BOSS • CMS Virtual Organization server at NIKHEF (Amsterdam) SE RefDB BOSS DB Workload Management System SE UI IMPALA/BOSS SE CE SE
CMS EDG CE CE CE CE parameters CMS software CMS software CMS software CMS software SE JDL data registration Push data or info WN Pull info CMS production components interfaced to EDG middleware • CMKIN jobs running on all EDG Testbed sites with CMS software installed • CMSIM jobs running on CE close to the input data • produced data: scripts for batch replication to a dedicated SE SE RefDB BOSS DB Workload Management System SE UI IMPALA/BOSS X input data location SE CE Replica Manager SE
CMS EDG CE CE CE CE parameters CMS software CMS software CMS software CMS software Job output filtering Runtime monitoring SE JDL data registration Push data or info WN Pull info CMS production components interfaced to EDG middleware • Job monitoring and bookkeeping: BOSS DBs, EDG Logging & Bookkeeping service SE RefDB BOSS DB Workload Management System SE UI IMPALA/BOSS input data location SE CE Replica Manager SE
Nb. of evts time CMS use of the system (Statistics) Events Production within EDG is part of the Official CMS production http://cmsdoc.cern.ch/cms/production/www/html/general/index.html SEs CEs
Summary of CMS work and the planning for use of EDG middleware • RESULTS • We can distribute and run CMS s/w in the EDG environment • We have generated ~250K events for physics with ~10000 jobs in 3 week period • OBSERVATIONS and PLANNING for the future • We were able to quickly add new sites to provide extra resources • There was a fast turnaround in bug fixing and installing new software • The stress test was labor intensive (since software was developing and th • Release EDG 2.0 should fix the major problems and allow for enhanced scalability,and we look forward to evaluating it and using it in our Data Challenge work
EDG EO challenge: Processing / validation of 1y of GOME data Raw satellite data from the GOME instrument (~75 GB - ~5000 orbits/y) LIDAR data (7 stations, 2.5MB per month) Level 1 ESA(IT) – KNMI(NL) Processing of raw GOME data to ozone profiles. 2 alternative algorithms ~28000 profiles/day (example of 1 day total O3) IPSL(FR) Validate some of the GOME ozone profiles (~106/y) Coincident in space and time with Ground-Based measurements DataGrid environment Level 2 Visualization & Analyze
Processing Sequence 1. Search Level-1 catalogue 12. Return new Level-2 products 2. Retrieve Level-2 products 3. Level-2 Products already registered in RC? 6. Transfer Level-1 data from Archive to the Grid 7. Register Level-1 data 11. Register level-2 data 8. Submit jobs to process Level-1 data 9. Process Level-1 data 10. Transfer Level-2 data to SE EO Product Catalogue Web Portal EO Product Archive EO Grid Engine Yes? 4. Return available Level-2 products No? 5. Perform GRID processing on-the-fly EDG User Interface EO Replica Catalogue EDG Resource Broker EDG Storage Element EDG Computing Element
GOME Ozone Profile Validation ERS/GOME satellite 50 km OZONE LAYER 10 km • Goals of the DataGrid application validate satellite data with all ground based data available in an easy way: • Comparison of ozone profiles provided by satellite with lidar data in different locations and times (see the web portal) • Statistical comparison and analysis in order to improve algorithms. Lidar at the Haute Provence Observatory
ComputingElement Validation Processing Sequence Satellite data validation Lidar site Level 2 Catalogue 2 Level 2 Catalogue Queries and data information retrieval from the Gome Level 2 orbit or pixel metadata catalogues Queries and data information retrieval from the Lidar metadata catalogue 3 Submission of the Job in the GRID GRID Portal Lidar data catalogue 1 GRID Storage Elements with Lidar data 4 When completed comparison between lidar and satellite ozone profiles Storage Elements with Gome L2 data
Validation Output Figure 1: Estimation of the bias between Gome and Lidar using one month of data. Figure 2 : example of 2 profiles : Comparison between Gome profile and lidar profile for the 2nd October 2000.
Perspectives for Biomedical Applications • Grids open new perspectives in large scale genomics analysis • Complete genome annotation • Cross-genomes analysis • Data mining on distributed databases • Pipelining of huge automatic bio-informatics analysis • Medical image processing • Large databases processing • Anatomy and physiology modeling • Epidemiological studies
Applications deployed Applications tested on EDG Applications under preparation Biomedical Applications • Bio-informatics • Phylogenetics : BBE Lyon (T. Sylvestre) • Search for primers : Centrale Paris (K. Kurata) • Statistical genetics : CNG Evry (N. Margetic) • Bio-informatics web portal : IBCP (C. Blanchet) • Parasitology : LBP Clermont, Univ B. Pascal (N. Jacq) • Data-mining on DNA chips : Karolinska (R. Médina, R. Martinez) • Geometrical protein comparison : Univ. Padova (C. Ferrari) • Medical imaging • MR image simulation : CREATIS (H. Benoit-Cattin) • Medical data and metadata management : CREATIS (J. Montagnat) • Mammographies analysis ERIC/Lyon 2 (S. Miguet, T. Tweed) • Simulation platform for PET/SPECT based on Geant4 : GATE collaboration (L. Maigne)
LFN image patient hospital ... Medical Imaging H 1. query Medical images Metadata 2. visualisation 5. best results visualisation 3. similarity search 4. scores
Graphic layer Grid File Browsing Job Monitoring File registration and retrieval
Graphical Interfaces Image registration Local files Grid files Metadata Image retrieval Query over metadata Query result
LFN image patient hospital ... Image Registration Imager SE
Similarity search Similarity computation Job monitoring Ranked list of images Results visualization Most similar images Low score images Source image
Replica Catalog RC interface Storage Element MSS Metadata interface File metadata ACL size checksum ... Client 1 interface Client 2 interface RS interface Storage Element core Application metadata ACL encryption key sensitive metadata ... grid - server interface header blanking encryption Medical server Future: Interfacing medical data with the Grid Replication Service Grid middleware Replica Master File Medical (trusted) site Imager
Parallel Processing • Magnetic Resonance Images simulation using the grid • 3 levels of parallelism: • Parallel isochromat computations • Multi-slice MRI computation • Parallel magnetization kernel Magnetisation Reconstruction computation Virtual object MRI algorithm kernel Image MRI sequence
Summary • Use Cases • High Energy Physics • Earth Observation • Biomedical Applications
Further Information • High Energy Physics http://datagrid-wp8.web.cern.ch/DataGrid-WP8/ • Bio-Informatics http://marianne.in2p3.fr/datagrid/wp10/index.html • Earth Observation http://styx.esrin.esa.it/grid/