210 likes | 223 Views
Tecnolog í as Grid aplicadas a Ciencias de la Tierra Manuel López-Puertas (IAA, Granada). Page 1. 1ª Reuni ó n e-CA, Granada, 19-20 Junio 2007. EGEE. EU grid project, follow on of DataGrid (2004-2008) 202 sites, 47 countries worldwide 30.000 CPUs, 11 Petabytes, 30.000 concurrent jobs
E N D
Tecnologías Grid aplicadas a Ciencias de la Tierra Manuel López-Puertas (IAA, Granada) Page 1 1ª Reunión e-CA, Granada, 19-20 Junio 2007
EGEE • EU grid project, follow on of DataGrid (2004-2008) • 202 sites, 47 countries worldwide • 30.000 CPUs, 11 Petabytes, 30.000 concurrent jobs • Originally from High Energy and Life Sciences => expanded to many other areas (Geosciences) • Ideal for scientific research where the time and resources are impractical for traditional IT • 25 linked grid projects More details of EGEE in posters
EGEE Earth Science grid projects • CYCLOPS aims to bridge the gap between Grid and GMES (Global Monitoring for Environment and Security) communities • EU-IndiaGrid, funded by EC: an European and Indian Grid-focused project. • DEISA (200 teraflops of supercomputing infrastructure). Founded by Europe's National Research and Education Networks (NRENs) and the European Commission. • DEGREE (Dissemination and Exploitation of GRids in Earth sciencE) • G-POD (Grid Processing on Demand) (ESRIN, ESA) • Access to ESA multi-mission catalog
WHY EGEE for GEOSCIENCES ? • Grid is very well adapted for Geosciences applications: • For statistical approach –intensive computation/storage • For rapid solution in case of many independent jobs • Sharing and/or processing large datasets in the large-scale European projects
DEGREE: Dissemination and Exploitation of GRids in Earth sciencE • EC Specific Support Action project (10 Institutes) • OBJECTIVES: • Bridge the Earth Science and GRID communities throughout Europe • Ensure that Earth Science requirements are satisfied in next Grid generation • Ensure the integration of emerging technologies for managing Earth Science knowledge • Demonstrate the interest of Grid for Geosciences with scientific results already obtained.
Earth Sciences Applications and Requirements (DEGREE Report) (1/2) • Deal with enormous amount of data (size and number of files) (Envisat: 500 Gb daily) • Large computational needs • Differences from other Science domains: • Deals with Geospatial 4D data • Many different domains • Scattered among all countries and numerous Institutes • Complex work
DEGREE Report on Earth Sciences Applications and Requirements /2/2) Specific requirements: • Reliability (good, well established Quality of Service) • Real time and Instantaneous access • Need to access licensed software (IDL, Matlab, Geocluster, …) • Data policies on input/output data (complicated security requirements) • Data scattered around various institutes, various formats, metadata in various forms =>Data management is essential (Accessibility and Harmonization) is essential => Need for a standardization on Grid service • Earth e-science can be an essential improvement in research
EXAMPLES (1) GOME Ozone profiles MERIS Global mosaic • Grid properties/requirements: • Large number of files (~ 40000; ~40000/per algorithm) • Metadata • Complex algorithm • License for IDL • Grid properties/requirements: • Large dataset (size+ # of files) • High security access • Dataflow • Monitoring • License for: Globus GT3/4 & Glite and LCG under testing
EXAMPLES (2) GRIMI-2 (MIPAS) Ozone in polar regions • Grid properties/requirements: • Currently being ported to EGEE • High security and restricted access for data • Licensed software • Grid properties/requirements: • A full reprocessing needs 4 TB of input data and 1 TB output • Full processing time on e.g. 100 nodes: 12 days!. • 2 Grid nodes => an NRT service. • Globus GT3/4 & Glite and LCG under testing
Other examples • SEISSOL, CMT, SpecFem3D, (research into earthquake simulations) • KORBA aquifer • COMSIMM (looking at current and future climate trends) • ICAROS (Chemical Assimilation of Remote Sensing Observations of the Stratosphere) • Space weather (SPIDR)
ES GRID, e-collaborations and SOA portals • Survey: 30 portals • Analyzed: 17. GRID: 8; Data dissemination: 4; Collaborative: 5
VOs in Earth Sciences (EGEE) • For Geosciences: two VOs: • ESR (Earth Science Research) : 50 members in an average, 10 countries, belonging to Academic • EGEODE (Expanding GEOscience On DEmand) : 30 members (~15 from Academy) centred on the use of the software, Geocluster, developed by CGG-Veritas (France). Devoted to: Generic seismic plate form
Summary • Deal with enormous amount of data • Large computational needs =>Grid is very well adapted for Geosciences applications • However, Earth Science community is “very reluctant to deploy their applications” • Not many applications ported. Why? • Differences from other science fields (Geospatial 4D data, multidisciplinary, scattered, complex work)? • Some alternatives (GPOD-ESRIN) seem driven by “costly effective” more than by e-science • Others…. • NEEDS: Data policies (security reqs., standardization) • Earth e-science can be an essential improvement in research
Our (GAPT-IAA) Needs • Earth Observation satellite data (MIPAS/Envisat) • Scientific accurate retrievals (not operational) • Data volume: 100 scans/orbit*14 orbits/day=1400 scans/day. ~4 years => 2 M scans • 20 species from each scan • Pre-processing (at home computers) • Full calculation: • Place a fixed set of large input files (HITRAN database) + • ~20 Mb (input/output) per scan (job) • A facility to send arbitrarily structured jobs to the batch computer (e.g., LSF, SLURM, etc. • Large RAM memory (1-3 Gb) • Compiler restrictions: Sun compiler to Linux Opteron:OK; PPCs, IBM compilers, Intel: Problems • CPU: 10 min(LTE)-5 hours(NLTE) per species & scan (Dec HP Fortran V5.5) • 1 hour * 20*2 M ~ 4500 years!