1 / 24

Earth System Grid Center for Enabling Technologies (ESG-CET) Overview

Earth System Grid Center for Enabling Technologies (ESG-CET) Overview.

river
Download Presentation

Earth System Grid Center for Enabling Technologies (ESG-CET) Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Earth System Grid Center for Enabling Technologies (ESG-CET) Overview Climate change is not only a scientific challenge of the first order but also a major technological challenge.The international climate community is expected to generate hundreds of petabytes of simulation data within the next three to seven years. ESG-CET Team

  2. Data management and analysis for the Earth System Grid • ESG’s mission is to provide climate researchers worldwide with access to: • data, • information, • models, • analysis tools, and • computational capabilities required to make sense of enormous climate simulation datasets. • ESG’s goals • make data more useful to climate researchers by developing Grid technology that enhances data usability, • meet specific distributed database, data access, and data movement needs of national and international climate projects, • provide a universal and secure web-based data access portal for broad multi-model data collections, and • provide a wide-range of Grid-enabled climate data analysis tools and diagnostic methods to international climate centers and U.S. government agencies. • develop key ideas and concepts that are important contributions to other domain areas. Earth System Grid Center for Enabling Technologies: (ESG-CET)

  3. Early 1990’s (e.g., AMIP1, PMIP, CMIP1): modest collection of monthly mean 2D files: ~1 GB Late 1990’s (e.g., AMIP2): large collection of monthly mean and 6-hourly 2D and 3D fields: ~500 GB Present (e.g., IPCC/CMIP3): fairly comprehensive output from both ocean and atmospheric components; monthly, daily, and 3 hourly: ~35 TB Future 2010: The IPCC 5th Assessment Report (AR5) in 2010: expected between 2.5 to 15 PB; The Climate Science Computational End Station (CCES) project at ORNL: expected around 3 PB; The North American Regional Climate Change Assessment Program (NARCCAP): expected around 1 PB; and The Cloud Feedback Model Intercomparison Project (CFMIP) archives: expected to be .3 PB Growing data Earth System Grid Center for Enabling Technologies: (ESG-CET)

  4. Network Traffic, Climate and Physics Data, and Network Capacity Ignore the units of the quantities being graphed they are normalized to 1 in 1990, just look at the long-term trends: All of the “ground truth” measures are growing significantly faster than ESnet projected capacity Projection 2010 value -- 40 PBy -- 4 PBy Historical Climate model data All Three Data Series are Normalized to “1” at Jan. 1990 ESnet traffic HEP experiment data ESnet capacity roadmap

  5. Data integration challenges facing climate science Modeling groups will generate more data in the near future than exist today Large part of research consists of writing programs to analyze data How best to collect, distribute, and find data on a much larger scale? At each stage tools could be developed to improve efficiency Substantially more ambitious community modeling projects (Petabyte (PB 1015) and Exabyte (EB 1018)) will require a distributed database Metadata describing extended modeling simulations (e.g., atmospheric aerosols and chemistry, carbon cycle, dynamic vegetation, etc.) (But wait there’s more: economy, public health, energy, etc. ) How to make information understandable to end-users so that they can interpret the data correctly More users than just Working Group (WG)1-science. (WG2-impacts and WG3-mitigation) (Policy makers, economists, health officials, etc.) Integration of multiple analysis tools, formats, data from unknown sources Trust and security on a global scale (not just an agency or country, but worldwide ) Earth System Grid Center for Enabling Technologies: (ESG-CET)

  6. Earth System Grid Center for Enabling Technologies (ESG-CET) Current ESG Sites ESG Goals • Petabyte-scale data volumes • Globally federated sites • “Virtual Datasets” created through • subsetting and aggregation • Metadata-based search and discovery • Bulk data access • Web-based and analysis tool access • Increased flexibility and robustness http://www.earthsystemgrid.org http://www-pcmdi.llnl.gov Earth System Grid Center for Enabling Technologies: (ESG-CET)

  7. Data Different formats not standardized Different sites require knowledge of different methods of access Log onto multiple sites to hopefully find and retrieve data Gigabyte (GB 109) - Terabyte (TB 1012) data volume Metadata Painful to produce Most kept in files separate from data Data lost or reproduced numerous times Locating data Manual Unreachable unless one is “in the know” (location kept in someone’s brain) Not formalized Data requests/analysis Beginnings of a formal process Far too much done by hand Logging nearly non-existent Climate model data management and analysis issues Before ESG ESG Present, Future Tremendous manual intervention, inefficient by any measure Computers do the more complicated and repetitive tasks and scientists focus on research • Data • Standard output • Model compliance tools to facilitate standard output • Quality assurance • Different sites but standardized access protocol • One stop shop • Terabyte (TB 1012) - Petabyte (PB 1015) data volume • Metadata • Exhaustive detail • Created via semi-automated processes • Put data in databases and make it visible to others • Locating data • Formalized process • Highly granular – down to per-file, per-model level, per-variable • Readily searchable - sophisticated search tools • Data requests/analysis/visualization • Completely automated • All logging done automatically • Secure Earth System Grid Center for Enabling Technologies: (ESG-CET)

  8. Current ESG architecture and underlying technologies Climate Data Metadata Catalog NcML (metadata schema) OPeNDAP-g (aggregation and subsetting) Data Management Storage Resource Mgr Data Transfer Globus Security Infrastructure Data Mover Lite GridFTP Monitoring and Discovery Services Replica Location Service Security Access Control MyProxy User Registration Long-term Storage Tertiary data storage systems Earth System Grid Center for Enabling Technologies: (ESG-CET)

  9. ESG monthly download volumes ESG usage: over 500 sites worldwide ESG: The world’s source for climate modeling data Earth System Grid Center for Enabling Technologies: (ESG-CET)

  10. Current Usage Future Usage • Search, browse and discover distributed data • Remote site • Request data • Regrids • Data system reduction • ESG returns user defined products • Browse database • Download data • Organize data on local site • Regrid data at local site • Perform diagnostics • Produces results Intercomparison example Earth System Grid Center for Enabling Technologies: (ESG-CET)

  11. ESG-CET improvements over ESG II • Much broader model for scientific metadata • ‘Faceted’ search capability guides the user toward datasets of interest • At a given point in the search, only those options which produce non-empty result sets are shown • Avoids ‘deadend’ searches • Flexible browsing hierarchy • Automated, GUI-based publication tools • Single sign-on • Full support for data aggregations • A collection of files, usually ordered by simulation time, that can be treated as a single file for purposes of data access, computation, and visualization • File-streaming and release capabilities for data access to deep storage • Client access to subsetting, visualization services • Server-generated visualization products • Fine-grained access to datasetsbased on user groups and roles. • User notification service • Users can choose to be notified when a dataset has been modified • Pre-computed products (e.g., global averages) • User workspace (storing of favorite products, search criteria, etc.) Earth System Grid Center for Enabling Technologies: (ESG-CET)

  12. Gateways and nodes • Federated architecture Federation is a virtual trust relationship among independent management domains that have their own set of services.Users authenticate once to gain access to data across multiple systems and organizations. • Gateways • Portals, search capability, distributed metadata, registration and user management • Initially PCMDI, NCAR, ORNL, eventually GFDL • May be customized to an institution’s requirements • More complex architecture than nodes, fewer sites • Nodes • Where data is stored and published • Data may be on disk or tertiary mass store. • Each node has a trust relationship with a specific gateway, for publication. • Data reduction • Less complex architecture • A site can be both a gateway and a node. Earth System Grid Center for Enabling Technologies: (ESG-CET)

  13. Architecture of the next generation of ESG-CET Earth System Grid Center for Enabling Technologies: (ESG-CET)

  14. The next generation ESG-CET system • Distributed and federated architecture • Support discipline specific Gateways • Support browser-based + direct client access Earth System Grid Center for Enabling Technologies: (ESG-CET)

  15. Use Case 1: scientific metadata search “Find surface temperature data across all models for a specific IPCC experiment that has volcanic forcing.” • Capture Scientific Metadata in detailed object model • Faceted Search to slice through data via user-selected categories • Link to Data Access Points (files or products) • Broker application for one-click request of data products? Earth System Grid Center for Enabling Technologies: (ESG-CET)

  16. Use Case 2: large number of files “Download 1000 files from deep storage to my desktop while I am sleeping.” Gateway interaction via Web Services: • User finds, requests data through Gateway • User passes request identifier to DML • DML downloads file as they become available • DML “releases” files already downloaded • User checks request status via DML (or Gateway) Earth System Grid Center for Enabling Technologies: (ESG-CET)

  17. Use Case 3: high-end product, federation “Show me sea surface temperature plots for 3 different datasets (output of different models, same forcing) that are stored at 3 different locations.” • More powerful data selection algorithm • Integration of LAS product server on Gateway, Data Nodes • Single Sign-On authentication via OpenID • Common authorization model Earth System Grid Center for Enabling Technologies: (ESG-CET)

  18. Use Case 4: multiple intercomparison example Earth System Grid Center for Enabling Technologies: (ESG-CET)

  19. Immediate and future challenges of software development • Sustain and build upon the very successful ESG archives (e.g., CCSM, CMIP3, CFMIP, PCM, POP, etc.) • Address future scientific needs for data management and analysis by extending support for sharing and diagnosing climate simulation data • SciDAC II: A Scalable and Extensible Earth System Model for Climate Change Science • Coupled Model Intercomparison Project, Phase 5 (CMIP5) for scientists contributing to the IPCC Fifth Assessment Report (AR5) in 2010, • The Climate Science Computational End Station (CCES), • The North American Regional Climate Change Assessment Program (NARCCAP), and • Other wide-ranging climate model evaluation activities. • How to make information understandable to end-users so that they can interpret the data correctly • Local and remote analysis and visualization tools in a distributed environment (i.e., subsetting, concatenating, regridding, filtering, …) • Integrating analysis into a distributed environment • Providing climate diagnostics • Delivering climate component software to the community Earth System Grid Center for Enabling Technologies: (ESG-CET)

  20. AR5 testbed partners Major driver for global federation: CMIP5 IPCC (AR5) in 2010 By early 2009 it is expected to include: Program for Climate Model Diagnosis and Intercomparison - PCMDI (U.S.), National Center for Atmospheric Research - NCAR (U.S.), Geophysical Fluid Dynamics Laboratory - GFDL (U.S.), Oak Ridge National Laboratory - ORNL (U.S.), British Atmosphere Data Centre - BADC (U.K.), Max Planck Institute for Meteorology - MPI (Germany), The University of Tokyo Center for Climate System Research (Japan). Earth System Grid Center for Enabling Technologies: (ESG-CET)

  21. ESG-CET AR5 timeline • 2008: Design and implement core functionality: • Browse and search • Registration • Single sign-on / security • Publication • Distributed metadata • Server-side processing • Early 2009: Testbed • Plan to include at least seven centers in the US, Europe, and Japan: • PCMDI, NCAR, GFDL, ORNL, BADC, MPI, CCSR • 2009: Deal with system integration issues, develop production system • 2010: Modeling centers publish data • 2011-2012: Research and journal articles submissions • 2013: IPCC AR5 Assessment Report Earth System Grid Center for Enabling Technologies: (ESG-CET)

  22. U.S. collaborations NOAA GFDL is an active contributor to AR5 and ESG-CET, CF and GO-ESSP Data Archive and Access Requirements Working Group (DAARWG) NASA Facilitating Climate Modeling Research By Integrating NASA and the Earth System Grid SciDAC Scientific Data Management Center (SDM) DataMover Lite - efficient bulk transfer of data in a secure grid environment SciDAC Visualization and Analytics Center (VACET) University of Utah, LLNL, LBNL, ORNL Integration of VisTrails visual analysis tool with CDAT Ultrascale Visualization Web enabled collaborative climate visualization Earth System Curator (ESC) Developing database schemas and interfaces for model configuration Tech-X Corporation Analyze and visualize petabytes of archived data on Mosaic grids VisTrails, Inc. Complete audit trail of computational processes Many more… Earth System Grid Center for Enabling Technologies: (ESG-CET)

  23. International collaborations Global Organization for the Earth System Science Portal (GO-ESSP) - focused on facilitating the organization and implementation of an infrastructure for full data sharing among a consortium spanning continents, countries, and intergovernmental agencies CF: The Climate and Forecast Metadata Convention Designed to promote the processing and sharing of files created with the NetCDF application programmer's interface. Enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities CMOR: Climate Model Output Rewriter Used to produce CF-compliant netCDF files that fulfill the requirements of many of the climate community's standard model experiments (such as CMIP, CFMIP, NARCCAP, etc.) Earth System Grid Center for Enabling Technologies: (ESG-CET)

  24. AR5 open issues and questions • What are the set of runs to be done and derived from that the expected data volumes we can expect? • Expected participants – where will data be hosted? (Who is going to step up and host the data nodes, and provide the level of support expect in terms of manpower and hardware capability.) • minimum software and hardware data holding site requirement (e.g., ftp access and ESG authentication and authorization) • skilled staff help desk • AR5 archive to be globally distributed with support for WG1, WG2, and WG3. Will there be a need for a central (or core) archive and what will it look like? • Replication of holdings - disaster protection, a desire to have a replica of the core data archive on every continent, etc. • Number of users and level of access – scientist, policy makers, economists, health officials, etc. Earth System Grid Center for Enabling Technologies: (ESG-CET)

More Related