400 likes | 412 Views
Explore the advanced cyberinfrastructure for Earth system modeling, including supercomputers, high-bandwidth networks, data centers, and Grids, as recommended by the Atkins Report. Enhance climate data production and model capabilities for a transformative research experience. Collaborate across disciplines to leverage technology for scientific breakthroughs. Propose capacity and capability improvements for climate model data, resolving spatial and temporal resolutions with enhanced quality and scope. Realize the Earth Simulator vision with a factor O(1000-10000) increase in computing power. Join the Earth System Grid project to manage terascale climate research data efficiently, empower multidisciplinary communities, and revolutionize scientific workflows.
E N D
CyberinfrastructureforEarth System Modeling Don Middleton NCAR Scientific Computing Division APAN eScience Workshop, Honolulu January 28, 2004
Cyberinfrastructure forEarth System Modeling • Supercomputers • High-bandwidth networks • Models • Data centers and Grids • Collaboratories • Analysis and Visualization
“Atkins Report” • “A new age has dawned…” • “The Panel’s overarching recommendation is that the National Science Foundation should establish and lead a large-scale, interagency, and internationally coordinated Advanced Cyberinfrastructure Program (ACP) to create, deploy, and apply cyberinfrastructure in ways that radically empower all scientific and engineering research and allied education. We estimate that sustained new NSF funding of $1 billion per year is needed to achieve critical mass and to leverage the coordinated co-investment from other federal agencies, universities, industry, and international sources necessary to empower a revolution. The cost of not acting quickly or at a subcritical level could be high, both in opportunities lost and in increased fragmentation and balkanization of the research.” • Atkins Report, Executive Summary
Characteristics of Infrastructure(from Kim Mish workshop presentation) • Essential • So important that it becomes ubiquitous • Reliable • Example: the built environment of the Roman Empire • Expensive • Nothing succeeds like excess (e.g. Interstate system) • Inherently one-off (often, few economies of scale) • Clear factorization between research and practice • Generally deploy what provably works
Climate Model Data Production • T42 CCSM (current, 280km) • 7.5GB/yr, 100 years -> .75TB • T85 CCSM (140km) • 29GB/yr, 100 years -> 2.9TB • T170 CCSM (70km) • 110GB/yr, 100 years -> 11TB
Capacity-related Improvements Increased turnaround, model development, ensemble of runs Increase by a factor of 10, linear data • Current T42 CCSM • 7.5GB/yr, 100 years -> .75TB * 10 = 7.5TB
Capability-related Improvements Spatial Resolution: T42 -> T85 -> T170 Increase by factor of ~ 10-20, linear data Temporal Resolution: Study diurnal cycle, 3 hour data Increase by factor of ~ 4, linear data CCM3 at T170 (70km)
Capability-related Improvements Quality: Improved boundary layer, clouds, convection, ocean physics, land model, river runoff, sea ice Increase by another factor of 2-3, data flat Scope: Atmospheric chemistry (sulfates, ozone…), biogeochemistry (carbon cycle, ecosystem dynamics), middle Atmosphere Model… Increase by another factor of 10+, linear data
Model Improvement Wishlist Grand Total: Increase compute by a Factor O(1000-10000)
Advances at the Earth Simulator ESC Climate Model at T1279 (approx. 10km)
We Will Examine Practically Every Aspect of the Earth System from Space in This Decade Longer-term Missions - Observation of Key Earth System Interactions Aqua Terra Landsat 7 Aura ICEsat Jason-1 QuikScat Exploratory - Explore Specific Earth System Processes and Parameters and Demonstrate Technologies Triana GRACE SRTM VCL Cloudsat EO-1 PICASSO Courtesy of Tim Killeen, NCAR
The Earth System Grid http://www.earthsystemgrid.org • U.S. DOE SciDAC funded R&D effort - a “Collaboratory Pilot Project” • Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data • Build upon Globus Toolkit and DataGrid technologies and deploy • Potential broad application to other areas
ANL Ian Foster (PI) Veronika Nefedova (John Bresenhan) (Bill Allcock) LBNL Arie Shoshani Alex Sim ORNL David Bernholdte Kasidit Chanchio Line Pouchard LLNL/PCMDI Bob Drach Dean Williams (PI) USC/ISI Anne Chervenak Carl Kesselman (Laura Perlman) NCAR David Brown Luca Cinquini Peter Fox Jose Garcia Don Middleton (PI) Gary Strand ESG Team
ESG Scenario • End 2002: 1.2 million files comprising ~75TB of data at NCAR, ORNL, LANL, NERSC, and PCMDI • End 2007: As much as 3 PB (3,000 TB) of data (!) • Current practice is already broken – the future will be even worse if something isn’t done…
ESG: Challenges • Enabling the simulation and data management team • Enabling the core research community in analyzing and visualizing results • Enabling broad multidisciplinary communities to access simulation results We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.
ESG: Strategies • Harness a federation of sites, web portals • Globus Toolkit -> The Earth System Grid -> The UltraDataGrid • Move data a minimal amount, keep it close to computational point of origin when possible • Data access protocols, distributed analysis • When we must move data, do it fast and with a minimum amount of human intervention • Storage Resource Management, fast networks • Keep track of what we have, particularly what’s on deep storage • Metadata and Replica Catalogs
HRM HRM Storage/Data Management Tera/Peta-scale Archive Server Client Selection Control Monitoring Tools for reliable staging, transport, and replication HRM Server Tera/Peta-scale Archive
OPeNDAP An Open Source Project for a Network Data Access Protocol (originally DODS, the Distributed Oceanographic Data System)
OPeNDAP-g • Transparency • Performance • Security • Authorization • (Processing) Distributed Data Access Services Typical Application Distributed Application Application Application Application netCDF lib OPeNDAP Client ESG client OPeNDAP Via http OPeNDAP Via Grid ESG + DODS data OpenDAP Server ESG Server Data (local) Data (remote) Big Data (remote)
ESG: NcML Core Schema • For XML encoding of metadata (and data) of any generic netCDF file • Objects: netCDF, dimension, variable, attribute • Beta version reference implementation as Java Library (http://www.scd.ucar.edu/vets/luca/netcdf/extract_metadata.htm) nc:netCDFType nc:dimension nc:VariableType nc:attribute netCDF nc:variable nc:values nc: attribute
isA Person [0,1] firstName [0,1] lastName [0,1] contact LEGEND Object [1] id Institution [0,1] name [0,1] type [0,1] contact AbstractClass worksFor Class participant role= isA inheritance association Project [0,n] topic type= [0,1] funding Activity [0,1] name [0,1] description [0,1] rights [0,n] date type= [0,n] note [0,n] participant role= [0,n] reference uri= Service [0,1] name [0,1] description isA isPartOf isA Campaign serviceId Investigation Ensemble isA isPartOf Experiment Analysis Observation Simulation [0,n] simulationInput type= [0,n] simulationHardware hasParent hasChild hasSibling Dataset [0,1] type [0,1] conventions [0,n] date type= [0,n] format type= uri= [0,1] timeCoverage [0,1] spaceCoverage generatedBy isPartOf
ESG Current Topology LBNL ANL gridFTP SERVER CAS NCAR HPSS visualize LAS SERVER ORNL HRM gridFTP gridFTP SERVER DISK LRC RLI gridFTP SERVER HPSS gridFTP HRM HRM execute MSS LLNL DISK RLI LRC gridFTP HRM LRC cross-update cross-update RLI gridFTP SERVER LRC RLI GRAM GATEKEEPER query ESG WEB PORTAL Tomcat/Struts submit ISI authenticate OGSA-DAI MySQL RDBMS query MyProxy
Data->Knowledge Petascale Knowledge Repository Mass Storage System (1.3PB) Establish new paradigms for managing and accessing scientific data based on semantic organization.
Collaborations & Relationships • CCSM Data Management Group • The Globus Project • Other SciDAC Projects: Climate, Security & Policy for Group Collaboration, Scientific Data Management ISIC, & High-performance DataGrid Toolkit • OPeNDAP/DODS (multi-agency) • NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS Project) • U.K. e-Science and British Atmospheric Data Center • NOAA NOMADS and CEOS-grid • Earth Science Portal group (multi-agency, intnl.) • ESMF (emerging)
NCL: Core • Approx. 500 built-in functions and procedures • File I/O & data model for Earth sciences • Unique grids, Climate-modeling routines • Spherical harmonics, Regridding and interpolation • Graphics (wind barbs, simple 3D plots) • 36 NCL core visual representations • Contours, XY plots, vectors, streamlines, maps, histograms, text, markers, polygons • Supported on Unix, Linux, Mac, and PC 10 years, 20 People involved with development, 50 person-years of effort, about 1.5 million lines of source, 500K lines of documentation
NCL as CI for a Community • CAM & CCSM Processor – 100 functions, 200 examples, 20K lines of NCL code (CGD) • WGNE Climate Diagnostics Processor – 10K lines of NCL code (CGD) • Award-winning Aviation Weather Site (RAP) • MM5 Analysis Package (RIP) • Weather Research & Forecast Model: Initial community analysis software and RIP • Community Data Portal (SCD)
NCL http://ngwww.ucar.edu/ncl
Collaborative Environments and the AccessGrid Science Portals + AccessGrid: University of Michigan (Knoop, Hardin) Vegetation & Ecosystem Mapping Program (VEMAP) NCAR/SCD VETS/KEG Argonne National Labs