190 likes | 211 Views
This document outlines the evolution and current activity of NOAA's Archive Architecture Team, which focuses on the management, preservation, and stewardship of environmental data. It discusses the challenges, observations, and proposes a way forward for effective data management.
E N D
EDMC Archive Architecture Team (AAT) Prepared for the Data Archiving and Access Requirements Working Group (DAARWG) Ken McDonald December 9, 2010
Outline • Introduction • Task Evolution • Phase I - Data Center/CLASS Focus • Phase II – “Centers of Data” • Current Activity • Summary of Observations • Way Forward • Discussion Prepared by Archive Architecture Team for DAARWG
NOAA’s Environmental Information Management Challenges Broad Scope for Environmental Data Stewardship • ~150 Research & Operational Observing Systems • ~4-5 Petabytes of data/year (~15 Pb total) • Data Management Challenges are Changing • No longer just about data volume • Data discovery and integration • Data stewardship and information Prepared by Archive Architecture Team for DAARWG
Origin of Archive Architecture Team “NOAA is, at its foundation, an environmental information generating organization.” - From the NGSP, 2010 • At early DAARWG meetings NOAA reported on Data Centers, CLASS, GEO-IDE • Preservation and stewardship of data and information was a general area of interest • DAARWG recognized some lack of coordination across initiatives • Archive Architecture Team formed in response to concerns NOAA NOSC NOAA Observing System Council CIO Council Chief Information Officer Council EDMC Environmental Data Management Committee AAT Archive Architecture Team DMIT Data Management Integration Team Prepared by Archive Architecture Team for DAARWG
Archive Architecture Team Members • Tina Chang/NMFS Jim Sargent/NMFS • Maureen Kenny/NOS • Justin Cooke/NWS • Bob Lipschutz/OAR-ESRL • Derrick Snowden/OAR • Richard Bouchard/NWS-NDBC • Lewis McCulloch/NESDIS-OSD • Ken McDonald/NESDIS-OSD • Adam Steckel/NESDIS-OSD • Steve DelGreco/NESDIS-NCDC • Scott Cross/NESDIS-NODC/NCDDC • Dan Kowal/NESDIS-NGDC • Brad Nunn/NESDIS-NODC/NCDDC • Ken Casey/NESDIS-NODC • Rick Vizbulis/NESDIS-OSD-CLASS • Doug Zirkle/NESDIS-OSD-CLASS Prepared by Archive Architecture Team for DAARWG
Phase I Study – Data Centers and CLASS Open Archival Information System Reference Model (OAIS-RM) Functions • Data Centers and CLASS following the International Standard for information preservation (OAIS-RM) • This standard identifies the functions required to provide long-term preservation • Study looked at details of the RM and extensions for a multi-center deployment of CLASS • Clarified GEO-IDE focus as NOAA-wide data access and integration • Raised the question of NOAA data collections at other facilities (“Centers of Data”) Prepared by Archive Architecture Team for DAARWG
Phase II Study – Centers of Data • Study broadened to consider archives for all NOAA data and information • Team members assembled information by line office on major data collections • Also characterized line office functions and facilities • Cases where data migrates to Data Centers for preservation and stewardship identified as “best practices. • “What to archive” procedure was concurrently developed by a separate team Major NOAA Data and Information Repositories Prepared by Archive Architecture Team for DAARWG
CurrentStudy Focus • Address need for a NOAA Archive Concept of Operations • Start with review of all relevant documentation • Use observations and lessons from earlier study phases • Concept should include “What to Archive” procedure but also “How” and “Where” • ConOps will describe overall procedure for making archive decisions and how it fits in the end-to-end environmental data management life-cycle A common language to promote integration across the diverse stakeholders in the NOAA environmental data lifecycle Prepared by Archive Architecture Team for DAARWG
Principles for Effective Environmental Data Management • Data should be archived and accessible • Adequate resources for end-to-end management • Management activities should involve users • Interagency and international partnerships • Metadata are essential • Expert stewards required for management • Process to decide what data to archive • Archive must support discovery, access, and integration • Effective management requires a formal, ongoing planning process National Research Council Committee on Archiving and Accessing Environmental and Geospatial Data at NOAA, 2007 Prepared by Archive Architecture Team for DAARWG
Following the NOAA-wide Policy (NAO) 212-15 Environmental data will be visible, accessible and independently understandable to users, except where limited by law, regulation, policy or by security requirements. • Specification of end-to-end data management life-cycle components • Using NAO Definitions • Data Management • Data Management Services • Data Stewardship • Envision “Archive Procedure” coming out of ConOps to specify decision making process Prepared by Archive Architecture Team for DAARWG
End-to-End Data Management Lifecycle Components As specified in revised NAO 212-15 • Determining what environmental data are required to be preserved for the long term and how preservation will be accomplished • Developing and maintaining metadata throughout the environmental data lifecycle that comply with standards • Obtaining user requirements and feedback • Developing and following data management plans that are coordinated with the appropriate NOAA archive for all observing and data management systems • Conducting scientific data stewardship to address data content, access, and user understanding • Providing for delivery to the archive and secure storage • Providing for data access and dissemination • Enabling integration and/or interoperability with other information and products Prepared by Archive Architecture Team for DAARWG
Phase I and II Study Results • Data Centers and CLASS use of the OAIS Reference Model good starting point for an Archive ConOps • Identified and described full set of archive functions • Provides common terminology • Multiple examples of Project/Program collaboration with Data Centers • NWS/National Data Buoy Center sends DART buoy data to NGDC and all other collections to NODC • NOS sends hydrographic survey data to NGDC • OAR has used NCDC as its archive for U.S. Integrated Surface Irradiance (ISIS) Level 2 data (SURFRAD) since collection began in 1995 • Key NMFS data sets are transferred to NODC archive Prepared by Archive Architecture Team for DAARWG
Archive Function Reflected in NOAA Program Plans • More projects developing Data Management Plans to address full data lifecycle • Major satellite campaigns and large surface observation programs • Coral Reef Program • Greater understanding of the role of the National Data Centers • Rolling Deck to Repository (R2R) Prepared by Archive Architecture Team for DAARWG
A Framework for developing a “Concept of Operations for NOAA’s Archives” Archive Architecture Team Report WHY NRC Principles and Guidelines Justification and need for stewardship. The “objective vision.” NAO 212-15 NOAA policy direction. Best practices and common language for discussing archives. The "conceptual framework" for archives and information preservation. HOW OAIS-RM Provides decisions regarding the information to preserve without specifying anything regarding the how to archive. “What to Archive” Procedure Provides overview or vision for how NOAA’s Archives work together. Provides way to shape implementation decisions or at least frames the questions that need to be answered. “How to Archive” CONOPS…
Observations and Conclusions • Terminology important…to a point • Drawing from NAO 212-15 and OAIS • OAIS very precise but different from common usage • ConOps will have to reconcile definitions • Three NOAA Data Centers recognized as enterprise archive centers • Fully aligned with charter and expertise • Good partnering relationships with NOAA programs • Clarified role of CLASS as the IT component • Single solution not most effective • Diversity of collections, resource restrictions, heritage capabilities, etc. may require different approach for different circumstances (e.g “levels of service”) • Archive Con-Ops and procedures should reflect this • Focus of ConOps will be “Information Preservation” • Science stewardship linked to preservation but requires its own set of procedures • ConOps will address relationship to science stewardship and other data life-cycle functions
End-to-end Data Life Cycle Decision Making Process The Decision Making Process associated with NOAA’s End-to-end Data Life Cycle Concept of Operations for the Preservation and Stewardship of NOAA’s Environmental Information Data Identification Stage NOAA Procedure for Scientific Records Appraisal and Archival Approval (“What to Archive”) Resource Verification Stage Submission Agreement Stage NOAA Archive Qualification Stage
Detailed View: What to Archive Process NOAA Procedure for Scientific Records Appraisal and Archival Approval (aka the “What to Archive” Process)
Way Forward • Plan is to continue development of major decision processes: • Identify key questions • Leverage best practices to develop options • Leverage early usage of “What to Archive” procedure • Include issue of access to Archived data • Propose procedure(s) to determine answers • Develop flow chart for each procedure • Use procedures to vet approach with management and stakeholders • Envision significant interactions with EDMC and CIO Council • Concurrently, develop ConOps to fully document procedures
Areas for Discussion • “Information Preservation” – Correct focus? • “Levels of Service” – Good idea? • “Separation of Preservation and Stewardship” – Right approach? • “Effective procedures” – How do we provide useful guidance?