290 likes | 440 Views
NOAA Data Management Activities. Deirdre Jones, EDMC Chair Jeff de La Beaujardière, DM Architect Prepared for DAARWG 2011-11-15. Outline. Motivation Recent EDMC Accomplishments EDMC FY2012 Plans DM Framework in NEO Strategy Data catalog approaches. 2. Motivation.
E N D
NOAA Data Management Activities Deirdre Jones, EDMC Chair Jeff de La Beaujardière, DM Architect Prepared for DAARWG 2011-11-15
Outline • Motivation • Recent EDMC Accomplishments • EDMC FY2012 Plans • DM Framework in NEO Strategy • Data catalog approaches 2
Motivation • NOAA Strategic Plan calls for: • Improved data interoperability and usability through application and use of common data management standards • Enhanced access and use of environmental data through data storage and access solutions, integration of systems, and long-term stewardship • Increased volume and diversity of data and information effectively integrated into models
New EDMC Procedural Directives • Data Management Planning • Directs managers of all projects and systems that produce data to write DM Plans • Data Documentation • Directs NOAA programs to provide data documentation (metadata) • Data Sharing by NOAA Grantees • Directs NOAA grantees to make their data publicly available All 3 are agenda topics for tomorrow
EDMC Plans for FY2012 (1/2) • Implement approved procedural directives • EDMC developing detailed work plan • Further discussion tomorrow • Begin to develop additional Procedural Directives • Data Access and Discovery • Goal: Enable users to find and retrieve NOAA data • Goal: Automate publication of NOAA data to data.gov and GEOSS • Data Citation • Goal: Enable datasets to be referenced by unique identifier to provide credit, enable usage metrics, and distinguish duplicates 5
EDMC Plans for FY2012 (2/2) • Hold 3rd annual NOAA‐wide EDM Conference • To engage stakeholders • Host OGC Workshop • Coordination on data access standards • Support DAARWG Meetings (twice annually) • To receive guidance from advisory board • Support development of Archive Concept of Operations • Called for in CLASS External Review • Briefing after lunch today 6
Data Management FrameworkfromNational Earth Observations (NEO) Strategy, ch. 4 (inter-agency draft) Jeff de La Beaujardière, PhD NOAA DM Architect
Data Management Framework Principles Principles • Full and Open Access • Preservation • Information Quality • Ease of Use Governance Architecture Standards Assessment Data Lifecycle Data Lifecycle Data Lifecycle Data Lifecycle Data Lifecycle from National Earth Observations (NEO) Strategy - Data Management Chapter (in preparation 2011)
Data Lifecycle Planning and Production Activities Data Management Activities Usage Activities from National Earth Observations (NEO) Strategy - Data Management Chapter (in preparation 2011)
Data Lifecycle Planning and Production Activities Requirements Definition Planning Development Deployment Operations Collection Processing Quality Control Documentation Cataloging Dissemination Preservation Stewardship Usage Tracking Final Disposition Data Management Activities Usage Activities Discovery Reception Analysis Product Generation User Feedback Citation Tagging Gap Assessment from NEO Strategy - DM Chapter (in prep. 2011)
Data Lifecycle Planning and Production Activities Requirements Definition Planning Development Deployment Operations Applicability of EDMC Directives DM Planning Collection Processing Quality Control Documentation Cataloging Dissemination Preservation Stewardship Usage Tracking Final Disposition Data Management Activities Data Documentation Cataloging Data Sharing Data Services What-to-Archive Usage Activities Discovery Reception Analysis Product Generation User Feedback Citation Tagging Gap Assessment Data Citation
Data Lifecycle Planning and Production Activities Requirements Definition Planning Development Deployment Operations Collection Processing Quality Control Documentation Cataloging Dissemination Preservation Stewardship Usage Tracking Final Disposition Data Management Activities Some of the possible feedback loops in the Data Lifecycle Usage Activities Discovery Reception Analysis Product Generation User Feedback Citation Tagging Gap Assessment
(proposed)NOAA Data Catalog Approach Jeff de La Beaujardière, PhD NOAA DM Architect
Catalog Goals • Users can find NOAA data for desired phenomenon, location and time • Without knowing Office/Program structure • Single starting to point to find the data that is accessible via web services and well documented • Data providers can register their services once, in a community catalog • And have their data be visible in a master catalog • NOAA leadership can see improvements in NOAA data discovery & access
Some Existing Community-Specific Catalogs IOOS Catalog UAF Catalog Services Data GeoPlatform (ArcGIS.com Portal) NODC Geoportal NCDC Geoportal NGDC Geoportal CWIC CLASS Catalog
Conceptual NOAA Distributed Catalog Architecture NOAA Web Site Analysis Tools data.gov GEOSS others... Users & Clients UI API API API NOAA Master Catalog federated search (or scheduled harvest) Community Catalogs UAF IOOS NCDC NODC NGDC others... Services Data
[OV-2] Data Management Overview Graphic:Connections and Information Flow (Note: Not all activities illustrated) Requirements Gap Assessment guide assess Tools guide Data User analyze Data Producer use understand write • Result • paper • decision • policy • response create get find (possibly colocated) Documentation* (Metadata) Data DM Plan* add ID ID Catalog Service assess publish* Archive Decision* cite register Data Access Service publish compile measure preserve Data Inventory Metrics Dashboard OAIS Reference Model Archive assess guide Archive ConOps NOAA Leadership *topic of current EDMC Directive
DM Principles from NEO Strategy Principles • Full and Open Access:Earth observations should be made fully and openly available to all users promptly, in a non-discriminatory manner, and free of charge. • Preservation:Earth observations should be managed as an asset and preserved for future use. • Information Quality:Earth observations should be of known quality and fully documented. • Ease of Use:Earth observations should be easily discoverable and accessible online using interoperable services and standardized formats that encourage the broadest possible use. from National Earth Observations (NEO) Strategy - Data Management Chapter (in preparation 2011)
Procedural Directive Data Management Planning (DMP) • Summary • Directs managers of all projects and systems that produce data to write DM Plans • Provides guidance on content of DM Plans, including: • General description of the data • Data documentation and standards • Data access methods • Initial data storage and long-term preservation • Provides a DMP template and FAQs • Feedback • Hundreds of comments through briefings, workshops, and meetings shaped principles, concepts and final text. • 117 comments received during official 30-day comment period • EDMC approval was unanimous 20
Procedural Directive Data Documentation • Summary: • Directs NOAA programs to provide data documentation (metadata) • Requires use of ISO 19115/19139 • Provides guidance on metadata content, including: • Metadata for Discovery • Metadata for Use • Metadata and Documentation for Understanding • Documentation of Collections • Documentation of Datasets • Documentation of Services • Highlights metadata resources, tools and challenges • EDMC approval was unanimous 21
Procedural Directive:Data Sharing by NOAA Grantees • Summary • Directs NOAA grantees to make their environmental data publicly available • Requires data sharing plan to be provided with new proposals and published at award • Data must be shared in a "timely" fashion but no later than two years after collection • Exceptions or extensions granted for legal reasons or on a case-by-case basis upon request • Provides guidance on data sharing plans • Includes metadata • FAQs and template • Feedback • EDMC approval • Feedback from Cooperative Institutes and Sea Grant Program 22
Good Data Management supportsNOAA Leadership Priorities • Data Catalog • ______ • ______ • ______ + Good Documentation NOAA Data enable Standardized Services + enables selected NOAA Leadership Priorities for NOAA data Data Inventory Ability to find, access, understand NOAA data Metrics Dashboard Visibility in data.gov and GEOSS
Tagging Concept External Catalogs or Portals DWH Response data.gov GEOSS Data CORE other portal Datasets with a relevant tag are recorded by external catalogs. NOAA Master Catalog metadata record Tag Database metadata record DWH metadata record metadata record Tags are not inserted into metadata records by data providers. Instead, the Catalog adds tags to indicate datasets relevant to a particular purpose. data.gov metadata record GEOSS CORE GEOSS StP metadata record metadata record Purpose E metadata record Purpose F
Potential Relationship of GeoPlatform to NOAA Master Catalog A) No relation B) GeoPlatform is Master Catalog C) GeoPlatform feeds Master Catalog GeoPlatform Map Svcs Only Master Catalog GeoPlatform Map & Data Svcs Master Catalog Map & Data Svcs WMS 1 WMS 2 Cat. 1 Cat. 2 Cat. 1 Cat. 2 Cat. N Cat. 1 Cat. 2 GeoPlatform Map Svcs Only Community Catalogs Community Catalogs Community Catalogs Map Services D) Master Catalog feeds GeoPlatform GeoPlatform Map Svcs Only Master Catalog Map & Data Svcs Cat. 1 Cat. 2 Cat. N Community Catalogs
GeoPlatform and Master Catalog working together Web-based Map Viewer data.gov GEOSS GCMD other catalog UI CS/W WAF Other API GeoPlatform (ArcGIS.com Portal) NOAA Master Catalog (Geoportal or t.b.d.) List of WMS List of manual registrations Manual registration Catalog 1 Cat. 2 Catalog 3 ArcGIS server KML Shapefile service data Some datasets might be registered directly in GeoPlatform
UAF Distributed Catalog Architecture ArcGIS Matlab IDV ERDDAP Analysis Tools API Unified Access Framework (UAF) Catalog Community Catalog THREDDS Catalog THREDDS Catalog THREDDS Catalog Project Catalogs DAP DAP DAP Project Data & Services gridded data gridded data gridded data
Use Google instead of a Dedicated Catalog? NOAA Web Site data.gov GEOSS Users & Clients ? ? ? Google & other search engine crawlers agreed convention to identify geodata servers (e.g., /geodata.xml ) Community Catalogs service Project Data & Services data
Probably want both formal catalog & search engine support NOAA Web Site external catalogs general users Users & Clients UI API simple search NOAA Master Catalog (machine API, spatial & temporal queries, controlled vocabularies) Google (free-text search) Geoportal Server GeoNetwork WAF THREDDS Catalog Community Catalogs service Project Data & Services data