360 likes | 380 Views
Understand the levels of service and maturity matrix for prioritizing scientific data stewardship to meet community needs. Learn the factors influencing data set priority and stakeholder interests. Explore the scientific, preservation, and societal impact dimensions.
E N D
Ron Weaver (with liberal borrowing from Bruce R. Barkstrom & John J. Bates NOAA –NESDIS-NCDC Data Set PrioritizationMaturity Matrixand Levels of Service PoDAG 25: Levels of Service
Outline • Background: • Why do we need data set prioritization? • Definitions: • Maturity Matrix • Levels of Service (LOS) • Examples: • How might NSIDC employ LOS and MM? • Discussion: • Role of PoDAG in the process PoDAG 25: Levels of Service
BACKGROUND • Prioritization called for by multiple studies. (c.f.): • NRC Climate Data Records • NSB Long Lived Data Sets • NASA Roadmapping efforts • NESDIS CLASS/ Science Data Stewardship • NASA HQ is asking the DAACs to prioritize their data holdings • NSIDC was the first DAAC to go through a prioritization process (January 2006). JPL is the second, and is using templates developed by NSIDC and ESDIS PoDAG 25: Levels of Service
activity impact ? Years ? years years An Approach to Prioritization Prioritization for what purpose? Identify data that is scientifically important Ingest, Keep, Throw away Change level/type of service Terra launched in 1999, papers by non-MODIS team members only in literature in ‘05 PoDAG 25: Levels of Service
An Approach to Prioritization Data Set priority might be determined from an assessment of the following • Data set Activity Level • Stakeholder interest • Maturity Matrix Level of Service is the outcome of the prioritization process and informs the users and stakeholders of the actions (to be) taken PoDAG 25: Levels of Service
MATURITY MATRIX PoDAG 25: Levels of Service
MM: Objective and Approaches • Objectives • Reduce difficulty and confusion in community about scientific data stewardship • Produce an easily understood way of identifying maturity of data products and hence science data stewardship requirements • Approaches • Barkstrom/Bates SDS Model • NASA Roadmap MMI Model PoDAG 25: Levels of Service
MM: A Simple Maturity Model Scientific Maturity • Represent data maturity in terms of three separate dimensions: • Scientific Maturity • Preservation Maturity • Societal Impact • Total maturity is simply length of vector Maturity ofdata for use Societal Impact Preservation Maturity PoDAG 25: Levels of Service
MM: Questions • How do we ensure common understanding of errors and their impact? • Two parts: error budget of the environmental element and of the measurement method leading to a signal to noise determination • How do we produce understandable measures of costs – including data production and long-term stewardship? • What metrics should we use for long-term value of data? • How do we assure that prioritizations are ‘apples to apples’? PoDAG 25: Levels of Service
Measurement Maturity Index Measurement Maturity 1 2 3 4 5 6 7 Planned Improvement Instrument Incubator - H - M - L Technology Operational Mission Operational Precursor - H - M - L Pathfinder Measurements OSSI - H - M - L Validated Modeling Initial Policy Routine Use - H - M - L Decision Support Pilot Technology development starts Pathfinder Mission launch Pilot Program for decision support begins Operational Precursor Mission launch Decision support use demonstrated Operational Mission launch Decision support use routine Technology Improvement begins
MM: NSIDC’s Proposed Approach • Science Maturity 1: physical understanding of measurement process (algorithms and data sources documented) 2: key measurement characteristics understood (instrument variability documented) 3: data processing steps transparent and available 4: rigorous validation (community acceptance of algorithms and validation) • Preservation Maturity 1: systematic approach to preservation implemented (metadata, data using known standards) 2: threats to data loss mitigated (routine media refresh, off-site backup) 3: long term preservation assured (funding and systems in place for multiple year curation) • Societal Impact (very tentative) 1: Short term predictions that impact society (health, property etc.) 2: Useful to determine trends that impact society 3: Useful to characterized impacts or uncertainty in other measurements that do impact society PoDAG 25: Levels of Service
Levels of Service: As Defined by NSIDC PoDAG 25: Levels of Service
Levels of Service from a Data Center Perspective PoDAG 25: Levels of Service
Levels of Service from a Data Center Perspective • Not in the previous list, but certainly considered • Production • Tools preparation and access • Long term archival issues • Resource demands PoDAG 25: Levels of Service
Downloaded presentation contains additional descriptive material derived from the NSIDC Data Policy Document PoDAG 25: Levels of Service
Examples: How DAAC Might Approach Prioritization PoDAG 25: Levels of Service
An Approach to Prioritization Prioritization for what purpose? Identify data that is scientifically important Keep, throwaway Change level/type of service Data Set priority might be determined from an assessment of the following • Data set Activity Level • Stakeholder interest • Maturity Matrix Level of Service defines the outcome of the prioritization process and informs the users and stakeholders of the actions taken PoDAG 25: Levels of Service
Data Set Activity Levels • Ingest • None: no ingest or production, no new data being added. • Low: less than 10% of total volume being ingested or produced in a given year or little or no staff intervention • Nominal: between 10 and 80% of volume being ingested or produced in a given year and/or routine staff intervention • High: greater than 80% of the volume being ingested or produced in a given year and/or significant staff intervention • Distribution* • None: no requests. Data set is archived in a steady state • Low: between 1 – 5 requests per year, less than 5% of the total data volume • Nominal: greater than 5 requests • High: greater than 50 requests and/or greater than 100 GB per month *distribution impact on NSIDC is driven more by number of requests that require user services interaction at the low end, but more by data volume at the high end. PoDAG 25: Levels of Service
Maturity Matrix Levels • Science Maturity 1: physical understanding of measurement process (algorithms and data sources documented) 2: key measurement characteristics understood (instrument variability documented) 3: data processing steps transparent and available 4: rigorous validation (community acceptance of algorithms and validation) • Preservation Maturity 1: systematic approach to preservation implemented (metadata, data using known standards) 2: threats to data loss mitigated (routine media refresh, off-site backup) 3: long term preservation assured (funding and systems in place for multiple year curation) • Societal Impact (very tentative) 1: Short term predictions that impact society (health, property etc.) 2: Useful to determine trends that impact society 3: Useful to characterized impacts or uncertainty in other measurements that do impact society PoDAG 25: Levels of Service
A proposed Template PoDAG 25: Levels of Service
Unanswered Questions in General • Important that the prioritization strategy fit in situ data sets as well as remote sensing (global coverage) data sets. Does this framework fit both? • How do we characterize unanticipated future uses? • Are there a different set of questions that should be asked at initial consideration time, versus questions when long term retention is being considered? • What is a data set? • In an ESDR framework, are the SSMIs (F-8, F-11, F-13 …) treated as data sets or is the SSMI timeseries the data set? • Important from a naming convention point of view PoDAG 25: Levels of Service
Unanswered Questions for PoDAG • How should PoDAG proceed on prioritization? PoDAG 25: Levels of Service
BACKUP SLIDES PoDAG 25: Levels of Service
Template for Data Producers • Title for specific data set (ESDT) or group of datasets • Brief Narrative Description • Product Algorithm Theoretical Basis • Science Need (justification) • Quality and Accuracy Information (cal/val, relative and absolute uncertainty, stability, maturity of algorithm) • Intended or Appropriate Product Use (also including limitations on use where appropriate) • Science Value (use of product for science, papers written, breakthroughs, multidisciplinary use) PoDAG 25: Levels of Service
Template for DAAC and ESDIS-SOO • Title for specific data set (ESDT) or group of datasets • Heritage • Rationale for DAAC involvement in the data set(s) • Where data came from • Authorization or agreement for DAAC to manage these data (EOS Program, DAAC User Working Group, MOUs, requests, other) • Descriptive Metrics (as described in SOO metrics presentation) • Size (e.g. data volume, number of granules, etc) • Activity levels • Level and Type of Service(s) • Characterization of Services from DAAC • Current Involvement/Responsibility • DAAC developed and/or managed • DAAC provided infrastructure • Shared responsibility with other NSIDC or external programs • Brokered with other institutions (meaning they are hosted at other institutions, with web presence on DAAC website) PoDAG 25: Levels of Service
Review Template prepared by NSIDC • Heritage • Justification • Science • EOSDIS • UWG • DAAC Responsibility (DAAC, NSIDC Shared, Brokered) PoDAG 25: Levels of Service
Following slides from John BatesNOAA NCDC PoDAG 25: Levels of Service
Metrics – A Maturity Model for Climate Data Records* • Reduce difficulty and confusion in community about scientific data stewardship • Produce an easily understood way of identifying maturity of data products and science data stewardship approaches • Help identify areas needing improvement * With Bruce Barkstrom, NASA PoDAG 25: Levels of Service
Component Maturity for Climate Data Records • Identify key attributes of maturity in each dimension • Develop maturity ranking for each attribute on scale of 1 to 5 • Summarize component maturity by weighting each attribute • Simplest weight = 1/Number of attributes • Develop more complex weightings after experience with approach • Advantage: can do much of work with simple spreadsheet PoDAG 25: Levels of Service
Scientific Maturity Key Attributes • Physical Understanding of Measurement Process • Measurement of Key Instrument Characteristics • Public Accessibility of Data Processing • Rigorous Validation PoDAG 25: Levels of Service
Key Attribute Assessment Areas - Public Accessibility of Data Processing PoDAG 25: Levels of Service
Preservation Maturity Key Attributes • Systematic Approach to Guaranteeing Preservation of Data Understanding • Systematic Reduction of Threats to Preservation • Assurance of Preservation Cost Effectiveness PoDAG 25: Levels of Service
Societal Benefit Key Attributes • Bibliometric Metrics • Publications and Citations • Scientific Community Knowledge • Data use, including interdisciplinary data fusion and statistical studies • Economic and Policy Utility • Market valuation increase • Reduction in time to influence policy • Benefit/Hazard Reduction resulting from data use PoDAG 25: Levels of Service
Some Caveats • Using a Maturity Model will be exploratory – and iterative • No expectation we’ll get it “right” the first time through • Community Diversity must be incorporated • Different views of data processing, calibration, validation, need for knowledge preservation • Different vocabularies • Deep Uncertainty needs to be incorporated • Diversity of opinions on areas of scientific controversy and value need common framework and disciplined discussion – openness a key • Including “societal benefit” is very difficult and risky PoDAG 25: Levels of Service
Key Benefits • Allows us to develop an approach consistent with NRC Recommendations on Metrics • Open Process • Can surface divergent needs and opinions • Can provide disciplined forum for discussion and resolution of differences • Periodic Evaluation is required • Incorporate new information and deeper thought • Evaluation allows new directions PoDAG 25: Levels of Service