360 likes | 383 Views
The Earth System Grid (ESG) facilitates management, discovery, and analysis of distributed terascale climate research data. This collaborative project aims to enable smooth workflows for computation, collaboration, data management, access, distribution, analysis, and visualization.
E N D
The Earth System Grid (ESG) APAN eScience Workshop January 27, 2005 Don Middleton On behalf of many project collaborators and a lot of great work! NCAR Scientific Computing Division Section Head, Visualization & Enabling Technologies
The ESG Collaboration LBNL: Climate storage facility ANL: Computational grids, & grid-based applications LLNL: Model diagnostics & inter-comparison USC/ISI: Computational grids, & grid-based applications LANL: High-resolution ocean models & computing NCAR: Climate change predication and scenarios ORNL: Climate storage & computational resources
A Lot of Data: Simulation Dataset Sizes by Resolution • T42 CCSM (current, 280km) • 7.5GB/yr, 100 years .75TB for one run • T85 CCSM (140km) • 29GB/yr, 100 years 2.9TB for one run • T170 CCSM (70km) • 110GB/yr, 100 years 11TB for one run
Advances at the Earth Simulator ESC Climate Model at T1279 (approx. 10km)
We Will Examine Practically Every Aspect of the Earth System from Space in This Decade Longer-term Missions - Observation of Key Earth System Interactions Aqua Terra Landsat 7 Aura ICEsat Jason-1 QuikScat Exploratory - Explore Specific Earth System Processes and Parameters and Demonstrate Technologies Triana GRACE VCL SRTM PICASSO Cloudsat EO-1 Courtesy of Tim Killeen, NCAR
The ESG Collaboration LBNL: Climate storage facility ANL: Computational grids, & grid-based applications LLNL: Model diagnostics & inter-comparison USC/ISI: Computational grids, & grid-based applications LANL: High-resolution ocean models & computing NCAR: Climate change predication and scenarios ORNL: Climate storage & computational resources
The Earth System Grid http://www.earthsystemgrid.org • U.S. DOE SciDAC funded R&D effort - a “Collaboratory Pilot Project” • Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data • Build upon Globus Toolkit and DataGrid technologies and deploy • Potential broad application to other areas
ESG People PIs: • Ian Foster (ANL) • Don Middleton (NCAR) • Dean Williams (LLNL) Team: • Veronika Nefedova (ANL) • Luca Cincuini (NCAR) • Gary Strand (NCAR) • Peter Fox (NCAR) • Jose Garcia (NCAR) • Rob Markel (NCAR) • Bob Drach (LLNL) • David Bernholdt (ORNL) • Kasidit Chanchio (ORNL) • Line Pouchard (ORNL) • Carl Kesselman (ISI) • Ann Chervenak (ISI) • Arie Shoshani (LBNL) • Alex Sim (LBNL)
ESG: Challenges • Enabling the simulation and data management team • Enabling the core research community in analyzing and visualizing results • Enabling broad multidisciplinary communities to access simulation results We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.
ESG: Strategies • Keep track of what we have, particularly what’s on deep storage • Metadata and Replica Catalogs • Move data a minimal amount, keep it close to computational point of origin when possible • Data access protocols, distributed analysis • When we must move data, do it fast and with a minimum amount of human intervention • Storage Resource Management, fast networks • Harness a federation of sites, web portals • Globus Toolkit -> The Earth System Grid -> The UltraDataGrid
Click IPCC Data
ESG Technologies: Security • Core security infrastructure provided by Globus GSI: digital certificates, public/private keys, proxies • ESG web-based digital registration system: • Hides from user complex details of digital certificate generation • Allows easy web access by common users to ESG data services
ESG Technologies: Metadata • Collection-level description metadata (“climate metadata”) • Describes logical objects involved in climate modeling • Stored in set of relational tables in OGSA-DAI MySQL database (RDBMS with Grid Service interface) • Input and output of database is XML • Location and replica metadata • Indicates the physical locations of the many copies of a single logical file • Stored in a system of distributed RLS (Replica Location Services): cross-updating grid-enabled MySQL databases installed at each site • Any RLI in the system can be used as starting point for obtaining all replicas (at any site) of a given lfn
ESG Technologies: Metadata • THREDDS metadata catalogs: • Generated from collection-level + location/replica metadata • NcML metadata: • NetCDF specific • Describes specific content of each file • Used to create virtual dataset aggregations
ESG Technologies: Data Transport • SRM (Storage Resource Manager) • Middleware that allows seamless access to data resources whether they are stored on rotating or deep storage • File transfer between any deep storage (NCAR MSS, ORNL HPSS, NERSC) and local cache • Reliable, high performance transfer between sites via GridFTP • Robust, efficient cache management capabilities • OPeNDAP-g • Integration of OPeNDAP API with Globus technologies (GSI authentication and GridFTP data transfer) • Extension for aggregation of NetCDF data
ESG Technologies: Web Portal • Main entry point into ESG system: provides simple, convenient web-based access to wide range of data services to access climate model data • Integrates and makes use of all other ESG technologies • Main ESG web portal at NCAR: gateway to distributed climate model datasets (PCM, CCSM data stored at NCAR, ORNL, NERSC, LLNL) • Same software under deployment by LLNL/PCMDI to serve locally stored IPCC data world wide
ESG Metrics(November 2004) • Community Climate System Model • 28.4 Terabytes, including 21 simulations, 141 datasets, and 289,374 files • Parallel Climate Model • 20.42 Terabytes, including 98 simulations, 434 datasets, and 44,000 files • Total • 48.8 Terabytes, 119 simulations, 575 datasets, in over 333,872 files • 167 registrations, 132 approved, 154.2GB downloaded to date • Plus new IPCC Data • 150 user registrations, 1.1TB of data downloaded, in 16,000 files
The Importance of Community:Collaborations & Relationships • GO-ESSP (multi-agency, intl.) • CCSM Data Management Group • IPCC • Globus Project • OPeNDAP/DODS (multi-agency) • NCAR’s Community Data Portal (CDP) • NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS Project) • U.K. e-Science & British Atmospheric Data Center • NOAA NOMADS and CEOS-grid • VSTO (new NSF/NMI-funded project) • Other SciDAC Projects: Climate, Security & Policy for Group Collaboration, Scientific Data Management ISIC, & High-performance DataGrid Toolkit
Establish new paradigms for managing and accessing scientific data based on semantic organization. Data->Knowledge Petascale Knowledge Environment Mass Storage System (2.0PB)