200 likes | 329 Views
Stanford Digital Repository. PREMIS & Geospatial Resources. Nancy J. Hoebelheinrich Knowledge Motifs nhoebel@kmotifs.com. To Be Discussed. An Overview of PREMIS data elements especially important for Geospatial Resources Examples from Geospatial Resources Issues & discussion.
E N D
Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge Motifs nhoebel@kmotifs.com PREMIS Implementation Fair, San Francisco, CA October 7, 2009
To Be Discussed • An Overview of PREMIS data elements especially important for Geospatial Resources • Examples from Geospatial Resources • Issues & discussion PREMIS Implementation Fair, San Francisco, CA October 7, 2009
Simple Example of use of PREMIS Object Data Elements • Applied at file level • Automatic insertion by Ingest code to retain important provenance info for each file before moving into the preservation repository • Original file name from data provider • Original checksum • Original file size PREMIS Implementation Fair, San Francisco, CA October 7, 2009
PREMIS Object Excerpt (v1.1) PREMIS Implementation Fair, San Francisco, CA October 7, 2009
More about PREMIS Object relationships • Defined as associations b/w two or more: • Object entities or • Entities of different types, e.g., an Object & an Agent. • Recorded for long term preservation purposes • Typical relationship types = structural (component of representation), derivative (format varieties), dependent (required schema or database structure) • Could be expressed using other schemas for packaging the resource such as METS or XFDU or MPEG DIDL PREMIS Implementation Fair, San Francisco, CA October 7, 2009
Use of PREMIS Rights data elements • Applied at representation level • Reference to donor’s Deposit Agreement (using METS) • Key info from the ingested Deposit Agreement for immediate playback PREMIS Implementation Fair, San Francisco, CA October 7, 2009
PREMIS Rights Excerpt (v1.1) PREMIS Implementation Fair, San Francisco, CA October 7, 2009
Event 1: Transform of descriptive MD from MS Access db => XML => MODS Applied at representation level Why this event? In case of questions from outside data provider Retain singular scripts & transform mechanisms Test practicability of recording such events in production environment Use of PREMIS Event for simple event PREMIS Implementation Fair, San Francisco, CA October 7, 2009
PREMIS Event Excerpt (v1.1) PREMIS Implementation Fair, San Francisco, CA October 7, 2009
Dataset 1: official street centerline file used by emergency services to locate street addresses Dataset 2: aspects of the road network including topography, angles & geometry of the road network used for a tourist map Another example: GIS Dataset: Street network of given metropolitan area Event to be documented: Merge c:\temp\states1;c:\temp \states2; c:\temp\USA PREMIS Implementation Fair, San Francisco, CA October 7, 2009
Want to describe full process of data creation Includes “merge” and data sources Advantage of PREMIS – can describe events once in repository Why this event? Important to describe processes during different phases of lifecycle, even prior to ingestion Use of PREMIS Event Data Elements PREMIS Implementation Fair, San Francisco, CA October 7, 2009
For data management within the repository Audit trail for descriptive MD Version of Ingest code? Data provider who created / altered the resource or the metadata, e.g., USGS which added FGDC MD to HRO from Monterey Bay Water Resource Use of PREMIS Agent Data Elements PREMIS Implementation Fair, San Francisco, CA October 7, 2009
PREMIS & Geospatial data -- Comments based on experiences: • Works well when: • Domain specific MD exists, e.g., FGDC for descriptive and technical MD • There are levels of the resource with MD to be associated, e.g., at representation & file(s) level • Need to document various points in the lifecycle of the data PREMIS Implementation Fair, San Francisco, CA October 7, 2009
PREMIS & Geospatial data -- Comments based on experiences: • In earlier versions of PREMIS unclear how to document: • Context • Environment including at time of creation • “Significant properties” • Existence of geospatial format registries PREMIS Implementation Fair, San Francisco, CA October 7, 2009
PREMIS v 2.0 more flexible • Still XML binding • Allows for containers • Allows hierarchical relationships • Extensible by use of new <premis:extension> element to insert other elements, XML fragments, e.g., technical MD, provenance metadata, etc. • Board considering the inclusion of mechanism used by packaging schemas to “wrap” or “reference” other metadata PREMIS Implementation Fair, San Francisco, CA October 7, 2009
For more detail, see “An Investigation into Archiving Geospatial data Formats “ prepared for NGDA Project, funded by NDIIPP (http://www.ngda.org/research.php) Formats examined Approaches of FGDC, PREMIS, and Center for International Earth Science Information Network (CIESIN)‘s Geospatial Electronic Record (GER) model on basis of: Environment/ computer platform Semantic underpinnings domain specific terminology provenance data quality appropriate use PREMIS & Complex Geospatial Data PREMIS Implementation Fair, San Francisco, CA October 7, 2009
Examples of Geospatial “Context” • Placing dataset in Time & Space • Semantic underpinnings, e.g., • Abstract • Description of purpose / research methodology • Intended use of data to avoid misinterpretation or misuse • Where to put? • FGDC has place • PREMIS would not necessarily consider this as “preservation” metadata, but rather “descriptive” or technical MD, however see v 2.0 PREMIS Implementation Fair, San Francisco, CA October 7, 2009
Examples of “Environment” and/or “Significant properties” for geospatial data • HW info pertinent at time of data creation • SW info pertinent at time of data creation (?) • Lineage or “provenance” data e.g., to communicate processing steps used to create scientific data product • Events, parameters & source data which influenced or impacted the creation of the data set prior to its ingestion into the archive in order to full understand the data that you’re getting PREMIS Implementation Fair, San Francisco, CA October 7, 2009
“Environment” & “Significant properties”, continued… • Data Quality – describing completeness, logical consistency, attribute accuracy • Data Trustworthiness – data creator / provider reliable? = “authentic” • Data Provenance – processes & sources for dataset = “understandable & reliable” • Understanding of the specific needs of the “designated community” PREMIS Implementation Fair, San Francisco, CA October 7, 2009
Questions? / comments? Nancy J. Hoebelheinrich njhoebel@gmail.com PREMIS Implementation Fair, San Francisco, CA October 7, 2009