1 / 36

The Earth System Grid (ESG)

The Earth System Grid (ESG) facilitates management, discovery, and analysis of distributed terascale climate research data. This collaborative project aims to enable smooth workflows for computation, collaboration, data management, access, distribution, analysis, and visualization.

Download Presentation

The Earth System Grid (ESG)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Earth System Grid (ESG) APAN eScience Workshop January 27, 2005 Don Middleton On behalf of many project collaborators and a lot of great work! NCAR Scientific Computing Division Section Head, Visualization & Enabling Technologies

  2. The ESG Collaboration LBNL: Climate storage facility ANL: Computational grids, & grid-based applications LLNL: Model diagnostics & inter-comparison USC/ISI: Computational grids, & grid-based applications LANL: High-resolution ocean models & computing NCAR: Climate change predication and scenarios ORNL: Climate storage & computational resources

  3. A Global Coupled Climate Model

  4. A Lot of Data: Simulation Dataset Sizes by Resolution • T42 CCSM (current, 280km) • 7.5GB/yr, 100 years  .75TB for one run • T85 CCSM (140km) • 29GB/yr, 100 years  2.9TB for one run • T170 CCSM (70km) • 110GB/yr, 100 years  11TB for one run

  5. CCM at T170 Resolution

  6. Advances at the Earth Simulator ESC Climate Model at T1279 (approx. 10km)

  7. We Will Examine Practically Every Aspect of the Earth System from Space in This Decade Longer-term Missions - Observation of Key Earth System Interactions Aqua Terra Landsat 7 Aura ICEsat Jason-1 QuikScat Exploratory - Explore Specific Earth System Processes and Parameters and Demonstrate Technologies Triana GRACE VCL SRTM PICASSO Cloudsat EO-1 Courtesy of Tim Killeen, NCAR

  8. IPCC

  9. The ESG Collaboration LBNL: Climate storage facility ANL: Computational grids, & grid-based applications LLNL: Model diagnostics & inter-comparison USC/ISI: Computational grids, & grid-based applications LANL: High-resolution ocean models & computing NCAR: Climate change predication and scenarios ORNL: Climate storage & computational resources

  10. The Earth System Grid http://www.earthsystemgrid.org • U.S. DOE SciDAC funded R&D effort - a “Collaboratory Pilot Project” • Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data • Build upon Globus Toolkit and DataGrid technologies and deploy • Potential broad application to other areas

  11. ESG People PIs: • Ian Foster (ANL) • Don Middleton (NCAR) • Dean Williams (LLNL) Team: • Veronika Nefedova (ANL) • Luca Cincuini (NCAR) • Gary Strand (NCAR) • Peter Fox (NCAR) • Jose Garcia (NCAR) • Rob Markel (NCAR) • Bob Drach (LLNL) • David Bernholdt (ORNL) • Kasidit Chanchio (ORNL) • Line Pouchard (ORNL) • Carl Kesselman (ISI) • Ann Chervenak (ISI) • Arie Shoshani (LBNL) • Alex Sim (LBNL)

  12. ESG: Challenges • Enabling the simulation and data management team • Enabling the core research community in analyzing and visualizing results • Enabling broad multidisciplinary communities to access simulation results We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.

  13. ESG: Strategies • Keep track of what we have, particularly what’s on deep storage • Metadata and Replica Catalogs • Move data a minimal amount, keep it close to computational point of origin when possible • Data access protocols, distributed analysis • When we must move data, do it fast and with a minimum amount of human intervention • Storage Resource Management, fast networks • Harness a federation of sites, web portals • Globus Toolkit -> The Earth System Grid -> The UltraDataGrid

  14. ESG Home

  15. PCM Metadata

  16. PCM Files and MSS

  17. ESG CCSM

  18. CCSM Datasets

  19. Subsetting List

  20. Subsetting Interface

  21. Click IPCC Data

  22. ESG architecture

  23. ESG topology

  24. ESG Technologies: Security • Core security infrastructure provided by Globus GSI: digital certificates, public/private keys, proxies • ESG web-based digital registration system: • Hides from user complex details of digital certificate generation • Allows easy web access by common users to ESG data services

  25. ESG Technologies: Metadata • Collection-level description metadata (“climate metadata”) • Describes logical objects involved in climate modeling • Stored in set of relational tables in OGSA-DAI MySQL database (RDBMS with Grid Service interface) • Input and output of database is XML • Location and replica metadata • Indicates the physical locations of the many copies of a single logical file • Stored in a system of distributed RLS (Replica Location Services): cross-updating grid-enabled MySQL databases installed at each site • Any RLI in the system can be used as starting point for obtaining all replicas (at any site) of a given lfn

  26. ESG Metadata Schema

  27. ESG Technologies: Metadata • THREDDS metadata catalogs: • Generated from collection-level + location/replica metadata • NcML metadata: • NetCDF specific • Describes specific content of each file • Used to create virtual dataset aggregations

  28. ESG Technologies: Data Transport • SRM (Storage Resource Manager) • Middleware that allows seamless access to data resources whether they are stored on rotating or deep storage • File transfer between any deep storage (NCAR MSS, ORNL HPSS, NERSC) and local cache • Reliable, high performance transfer between sites via GridFTP • Robust, efficient cache management capabilities • OPeNDAP-g • Integration of OPeNDAP API with Globus technologies (GSI authentication and GridFTP data transfer) • Extension for aggregation of NetCDF data

  29. ESG Technologies: Web Portal • Main entry point into ESG system: provides simple, convenient web-based access to wide range of data services to access climate model data • Integrates and makes use of all other ESG technologies • Main ESG web portal at NCAR: gateway to distributed climate model datasets (PCM, CCSM data stored at NCAR, ORNL, NERSC, LLNL) • Same software under deployment by LLNL/PCMDI to serve locally stored IPCC data world wide

  30. ESG Technologies: Aggregation/subsetting

  31. ESG Metrics(November 2004) • Community Climate System Model • 28.4 Terabytes, including 21 simulations, 141 datasets, and 289,374 files • Parallel Climate Model • 20.42 Terabytes, including 98 simulations, 434 datasets, and 44,000 files • Total • 48.8 Terabytes, 119 simulations, 575 datasets, in over 333,872 files • 167 registrations, 132 approved, 154.2GB downloaded to date • Plus new IPCC Data • 150 user registrations, 1.1TB of data downloaded, in 16,000 files

  32. The Importance of Community:Collaborations & Relationships • GO-ESSP (multi-agency, intl.) • CCSM Data Management Group • IPCC • Globus Project • OPeNDAP/DODS (multi-agency) • NCAR’s Community Data Portal (CDP) • NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS Project) • U.K. e-Science & British Atmospheric Data Center • NOAA NOMADS and CEOS-grid • VSTO (new NSF/NMI-funded project) • Other SciDAC Projects: Climate, Security & Policy for Group Collaboration, Scientific Data Management ISIC, & High-performance DataGrid Toolkit

  33. ‘ing Our Data

  34. Establish new paradigms for managing and accessing scientific data based on semantic organization. Data->Knowledge Petascale Knowledge Environment Mass Storage System (2.0PB)

  35. www.earthsystemgrid.org

More Related