220 likes | 349 Views
A PROPOSED EARTH SCIENCE COLLABORATORY . K-S Kuo 1,2 , Chris Lynnes 1 , Rahul Ramachandran 3 1 NASA Goddard Space Flight Center, USA 2 Caelum Research Corporation, USA 3 University of Alabama-Huntsville, USA. Why ESC?. Data Intensive Science Many forms and sources of data
E N D
A PROPOSED EARTH SCIENCE COLLABORATORY K-S Kuo1,2, Chris Lynnes1, Rahul Ramachandran3 1NASAGoddard Space Flight Center, USA 2Caelum Research Corporation, USA 3University of Alabama-Huntsville, USA IGARSS 2011, Vancouver, Canada
Why ESC? • Data Intensive Science • Many forms and sources of data • In situ measurements • Remote sensing observations • Model simulations • Large volumes of data • Effectiveness as a scientist • Increasing proportion of effort in data management • Threatening: • Reproducibility • Correctness • Productivity IGARSS 2011, Vancouver, Canada
What is an ESC? Vision of arich model development/simulation and data analysis environment that: • Provides access to various Earth Science models • Facilitates model and analysis software development • Provides access across a wide spectrum of Earth Science data • Provides a diverse set of science analysis services and tools • Supports the application of services and tools to data • Supports collaboration, i.e. sharing of data, tools and results • Supports discovery and publication of all science artifacts Basically, a new and natural place for Earth scientists to conduct their work and collaborate with others! IGARSS 2011, Vancouver, Canada
The Situation TodayIslands of data and services with selective connectivity Data Center A Data Center C Data Center B IGARSS 2011, Vancouver, Canada
High-Level View Cyberinfrastructure Laboratory Notebook Workflow Mediator Tool Library Data Library Data Centers IGARSS 2011, Vancouver, Canada
Tool Library • Discovery • Social • Sharing • Tagging • Discussion • Configuration Management • Testing • Versioning • Packager • autoconf • RPM • Web wrapper • Provisioned • GrADS • IDL • MatLab • ncl • nco • cdat • Contributed • [Tool 1] • [Tool 2] • [Tool 3] • [Tool 4] • [Tool 5] • … • Community • Quality filter • Coincidence • Feature detection • Event service • Visualization • Personal • [Tool 1] • [Tool 2] • [Tool 3] • [Tool 4] • [Tool 5] • … IGARSS 2011, Vancouver, Canada
Data Library • Cache • Discovery • Social • Sharing • Tagging • Discussion • Configuration Management • Testing • Versioning • Packager • data probe • format check • metadata wizard • Provisioned • EOSDIS • Contributed • [Dataset 1] • [Dataset 2] • [Dataset 3] • [Dataset 4] • [Dataset 5] • … • Community • Field campaigns • MEaSUREs • ACCESS • Validation • Personal • [Dataset 1] • [Dataset 2] • [Dataset 3] • … IGARSS 2011, Vancouver, Canada
Workflow Library • Discovery • Social • Sharing • Tagging • Discussion • Configuration Management • Testing • Versioning • Packager • Workflow editor • Provisioned • Processing Algorithms • Contributed • [Workflow 1] • [Workflow 2] • [Workflow 3] • [Workflow 4] • [Workflow 5] • … • Community • GeoBrain • SciFlo • Data Mining • Giovanni • Personal • [Workflow 1] • [Workflow 2] • [Workflow 3] • … IGARSS 2011, Vancouver, Canada
Laboratory Notebook • Discovery • Social • Sharing • Tagging • Discussion • Configuration Management • Versioning • Packager • Project Manager • Experiment manager • Notebook editor • Provisioned • Tutorials • User guides • Example uses • Educational packages • Project • [Project 1] • [Project 2] • [Project 3] • [Project 4] • [Project 5] • … • Community • Project results • Publications • Example cases • Educational packages • Personal • Notes • Journals • … IGARSS 2011, Vancouver, Canada
Mediator • Mediates tool interaction with data • OPeNDAP – a common data model (accessible by most tools) • Custom modules reformat data for the rest of the tools • Ontology matches tools with data, and vice versa. IGARSS 2011, Vancouver, Canada
CyberinfrastructureServices used by all other components • Security • authentication • authorization • code audit/padded cell • integrity checking • Social • tagging • sharing • discussions • groups • Cloud • elastic provisioned storage and computing • Discovery • data, tools, workflows, experiments • search by keyword, variable, time, author • Information Management • provenance • identifiers • archive • Semantic Web • data ontology • tools ontology IGARSS 2011, Vancouver, Canada
Key Advantages of ESC • Tool availability will be a force multiplier • More tools will be usable with more datasets • More tools will be more available to more users • Knowledge sharing evolves from text on paper to a rich mixture of data, tools, workflows and articles • A “wikihow” for Earth Science data analysis • Incorporating live data, services and workflows • ESC maintains a record of the analysis process • Share, repeat, build upon analysis techniques • Transparency of the process is built in IGARSS 2011, Vancouver, Canada
Prior Art • Talkoot, myExperiment.org– workflow sharing, virtual notebooks • Earth System Grid – provisioned tools, format standards/checkers • NASA Earth Exchange (NEX) • Land Information System – OPeNDAP as access infrastructure • Earth Science Modeling Framework – programmatic approach to integration • Giovanni, LAS – community services/tools • Canadian Space Science Data Portal (EOS, Feb. 22, 2011) • Nebula – cloud provisioning IGARSS 2011, Vancouver, Canada
A Use CaseGPM Precipitation Retrieval Algorithm Development • GPM Core Satellite: Dual-Frequency Precipitation Radar (JAXA) and GPM Microwave Imager (NASA) • GPM Constellation: International partner satellites with mostly microwave radiometers • Retrieval algorithms – 3 types • Radar-only • Radiometer-only • Radar-radiometer-combined • Participants in algorithm development are distributed in Japan, NASA centers (GSFC, MSFC, JPL), NCAR, and universities (FSU, Uwisc, etc.) IGARSS 2011, Vancouver, Canada
A Use CaseGPM Algorithm Development – Current Situation • Interdependence among 3 types of algorithms • Communication/Coordination– Narrow bandwidth • Periodic workshop meetings and teleconferences • Data access – Duplicative • Each location/group has a copy or subset of required data • Sharing of data/tools – Individual, not concerted • through ftp/email • Knowledge sharing – Delayed IGARSS 2011, Vancouver, Canada
A Use CaseGPM Algorithm Development – with ESC Cloud A Z Tools Tools ESC Client ESC Client VM Image VM Image B A Tools Tools Data Data mySci Cat. mySci Cat. Data Data Community Catalog ESC IGARSS 2011, Vancouver, Canada
A Use CaseGPM Algorithm Development – Multi-level Membership L J D C B K A I M G F E H GPM Combined Algorithm Radar-Only Radiometer-Only
A Use CaseGPM Algorithm Development – in ESC • Enhanced communication/coordination – wide bandwidth • Efficient data access – less duplication • Improved sharing – more pervasive • Effective knowledge sharing – immediate IGARSS 2011, Vancouver, Canada
Thank you! IGARSS 2011, Vancouver, Canada
Why now? • Because we can do it (finally)! • Advances in standards acceptance andimplementation (OPeNDAP, autoconf) • A consistent, loosely coupled architecture encapsulates complexity and maximizes flexibility • Social networking has reached the mainstream • Key lessons can be learned from prior efforts • The need is growing • Interest in working with multiple datasets is growing • Calls for transparency and reproducibility are growing IGARSS 2011, Vancouver, Canada
What’s New? • Macro View (forest-level) • Systematic approach to making data available to services and vice versa • Integration of all major analysis components • Consistent view of all architectural components • Cyberinfrastructure services for all architectural components • Micro View (tree-level): Nothing! IGARSS 2011, Vancouver, Canada
How to move forward? • Option 1 • RFC to community on feasibility, challenges, approach • Followed by RFPs for component and integration • Option 2 • Narrow end-to-end prototype • Followed by refactoring and broadening IGARSS 2011, Vancouver, Canada