Development of Program Level Product Quality Metrics

Development of Program Level Product Quality Metrics Robert Frouin1, Rama Hampapuram2, Greg Hunolt3, Kamel Didan4, and others5 1Scripps Institution of Oceanography, 2GSFC / ESDIS, 3SGT, 4UofA, 5MEaSUREs PIs _________________________________ ESDSWG Meeting – MPARWG Breakout 20-22 October 2010, New Orleans

Goal • The purpose is to stir a discussion about the concepts of product quality metrics useful to the program (managers, missions, etc…) • That may (and should) lead to an agreement on an approach to provide Program level metric(s) on usabilityof MEaSUREs products by the user community. • This discussion started Aug. 2010 (involving all MEaSUREs’ PIs) • Some level of details (or a way forward) “needs” to be worked out preferably at this meeting

Context • With global scale and multi-temporal data records increasingly available, easier to acquire and use for science, it becomes imperative that a programmatic level product quality metric be in place to insure they’re properly supporting science and policy making. There are four overarching themes: • 1. Traceability (reproducibility, repeatability, etc…) • 2. Fidelity (high quality, known error and uncertainty, etc…) • 3. Transparency (community algorithms, good practices, documentation, interoperability, etc…) • 4. Impact (science, economics, society, etc…)

MEaSUREs and Product Quality • “Product Quality” has two parts • Scientific quality of data • Usability of package consisting of data and documentation • Projects may track these in detail for their own purposes • Details may vary from project to project • Programmatic interest is in tracking progress and aggregated reporting • Common, agreed upon, definitions across projects • Simple (small number of) metrics for indicating overall progress in individual projects as well as Program as a whole

Starting Points • Progress so far • Robert Frouin’s list of criteria • Uniqueness • Interpretability • Accuracy • Consistency • Completeness • Relevance • Accessibility • Level of usability • Greg Hunolt’s strawman tables

Importance of Assessing Product Quality -To measure how well products conform to “requirements” (who and how to define req.?) -To track maturity and progress (e.g., accuracy and coverage). -To ascertain whether products are used “properly” (consider user creativity!). -To take necessary corrective actions or improvements.

Objective -To determine what program level product quality metrics would make sense – i.e. be meaningful, clear and concise, and be practical to collect and report. -Dimensions and criteria should be defined for programmatic assessments and planning, i.e., they may differ from the detailed standards for product quality developed at the project level.

NASA Guidelines for Ensuring Quality of Information -From NASA’s viewpoint, the basic standard of information quality has three components: utility, objectivity, and integrity. -In ensuring the quality of the disseminated NASA “information”, all of these components must be “sufficiently” addressed.

-Utility: Refers to the extent that the information can be used for its intended purpose, by its intended audience. -Objectivity: Refers to the extent that the information is accurate, clear, complete, and unbiased. -Integrity: Refers to the protection of NASA’s information from unauthorized access, revision, modification, corruption, falsification, and inadvertent or unintentional destruction. -The disseminated information and the methods used to produce this information should be as transparentas possible so that they can, in principle, be reproducibleby qualified individuals.

Dimensions and Criteria to Consider for Product Quality Metrics • -Uniqueness: How unique is the data set? Can it be obtained from other sources at the same temporal and spatial resolution, over the same time period, with the same accuracy? • How “meaningful” and how to measure this? • -Interpretability: Is the data clearly defined, with appropriate symbols and units? Is the data easily comprehended? Are the algorithms explained adequately? Are possible usages and limitations of the data documented properly?

-Accuracy: How does the data agree with independent, correct sources of information (reference data), especially in situ measurements? How biased is the data? How does accuracy depend on spatial and temporal scales, geographic region, and season? -Consistency: Is the data always produced in the same way (e.g., from one time period to the next)? Is the data coherent spatially and temporally, and does it remain within the expected domain of values? Is the data in accordance with other (relevant) data or information?

-Completeness: Is some data missing (e.g., due to algorithm limitations or nonexistent input)? Is the data sufficiently comprehensive (e.g., long-term, extended spatially) and accurate for usability? -Relevance: How significant or appropriate is the data for the applications envisioned? What advantages are provided by the data? -Accessibility: How available, easily and quickly retrievable is the data? Is the data sufficiently up-to-date? Can the data be easily manipulated? Does the data have security restrictions?

Straw Man Approach to Product Quality Metrics -Usabilityis an overarching criteria because for a product to be fully usable the product must not only be of high science quality, but that quality, along with all other information required for use of the product, must be documented. -This suggests the possibility of defining a set of usability levels that would address not only intrinsic science quality but also the other factors that contribute to, or are required, for a product to be usable (i.e., documentation, accessibility, and support service).

Straw Man Usability Levels -The usability levels would derive from the science quality, documentation, and accessibility levels, in which criteria defined previously could come into play.

Straw Man Intrinsic Science Quality Levels The “Factors” could be selected criteria that apply to Intrinsic Science Quality. Each criterion or ‘factor’ used could have its set of questions, and the answers to those questions could be the basis for “High”, “Medium” or “Low” for that factor.

Straw Man Documentation Levels

Straw Man Accessibility / Support Services Levels

-In this approach, the metrics associated with usability, intrinsic science quality, documentation, and accessibility / support Services should be defined for those items that need to be tracked at the program level, i.e., that are both important and potentially problematical or a key measure of a project’s process. -Some level of detail is necessary. Some criteria must be objective, since perceptions of the individuals involved with product development may be subjective. -The metrics should provide information on the state of the product without the conceptual knowledge of the application (project-independent) and with specific applications in mind (project-dependent).

Interaction with Users (who measures the metric?) -The perceived quality of a product by users, or the real world quality of products, may be very different from the analysis by those involved in generating the products. -User surveys are complementary to internal (i.e., collected from stakeholders) metrics. They are necessary to assess, using comparative analysis, proper usage and adequate documentation and accessibility, which may lead to corrective actions for improving product quality.

discussions

BACKUP

Same sensor(s) & a “simple” reprocessing (C4 to C5) leads to major change Summer NDVI comparisons 10+% sometimes Winter EVI comparisons

Consider • A published paper using MODIS C4 data record • A new Analysis by C5 confirmed the basic findings of the published paper, but there was noticeable spatial differences • Some had issues with the differences C4 based Amazon response to 2005 drought C5 based Amazon response to 2005 drought Saleska, Didan, Huete & Da Rocha (science 2007)

Implications on the carbon cycle MODIS C5 EVI based Annual GPP MODIS C4 EVI based Annual GPP C5 – C4 Ann. GPP Difference C5 – C4 Ann. GPP Percent Difference

Also consider • Data from MODIS that describe the behavior of a patch of vegetation • Use all data (most users do it) • Documentation is not clear as to what not to do ? For example atmospherically corrected data gives a false sense of “corrected”. • Filter and use remaining data (few users do it but then it becomes a challenge to use RS data in general) • Find a work around • Case by case basis • The challenge is how to make sense of these issues • Error and uncertainty reported as a single number by MODIS (Global multi-temporal data) is for the most part useless!

Synoptic TS data is quite problematic • Know the limitations of the data

Global clouds & data usefulness metrics

Global data performance JFM AMJ JAS OND Annual average 25 50 75 0 100 %

Development of Program Level Product Quality Metrics

Development of Program Level Product Quality Metrics

Presentation Transcript

Software Quality Metrics

Quality Improvement Level Up Program

Metrics Data Program

Key Quality Metrics

A Metrics Program

Software Quality Metrics

Article Level Metrics

First Stab at Program Level Quality Metrics

Top Level (System) Product Quality Scorecard

Objectives of quality measurement Classification of software quality metrics Process metrics

Air Quality Program Development

Software Quality Metrics

Program Success Metrics

Software Quality Metrics

Software Metrics/Quality Metrics

Quality and Metrics

Article-level Metrics @ Springer

Quality Metrics

Objectives of quality measurement Classification of software quality metrics Process metrics

Honeywell Radar Level Product Training Program