30 likes | 124 Views
Provenance, Production, and Planning. Bruce R. Barkstrom NOAA’s National Climatic Data Center Asheville, NC. Basic Facts. Much of Earth Science (and some space science) data results from discrete production Files Jobs Files and Jobs are Denumerable Indexable as time series
E N D
Provenance, Production, and Planning Bruce R. Barkstrom NOAA’s National Climatic Data Center Asheville, NC
Basic Facts • Much of Earth Science (and some space science) data results from discrete production • Files • Jobs • Files and Jobs are • Denumerable • Indexable as time series • Connection between jobs files and jobs is a graph • Societal importance of climate data will require legal-strength proof of chain of custody and production
Key Consequences • Jobs may use previously produced data to guide next step in production • Provenance graphs may include millions of objects • Cannot expect provenance to fit within files • Current metadata standards are (provably) incomplete • New versions of data products may be produced by 4 kinds of changes: • Input data, source code, coefficients, connectivity + hardware/infrastructure code • Time series organization of files gives a reasonable basis for a hierarchical permanent registration schema for files and file contents