280 likes | 294 Views
This presentation discusses the history and uses of PREMIS (Preservation Metadata Implementation Strategies) in the context of geospatial resources. It provides an overview of PREMIS data elements and their implementation, as well as examples of how PREMIS is used with geospatial resources.
E N D
Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. HoebelheinrichInfoAnalyticsSan Mateo, CA ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
To Be Discussed • A Brief History of PREMIS • An Overview of PREMIS data elements • Uses for Geospatial Resources: Examples ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
A Brief History of PREMIS • PREMIS – Preservation Metadata came initially from cultural heritage / digital preservation communities • Built upon previous initiative (2001 - 02 ) • Sponsored by two key library descriptive MD utilities (OCLC and RLG) • Preservation Metadata Framework working group • Issued a report outlining types of information that should be associated with an archived digital object ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
A Brief History of PREMIS • In 2003 a PREMIS working group formed • Comprised of practitioners building or working on preservation repositories including national data centers in the UK & US, Netherlands, etc. • Focused upon implementabledata elements • Resulted in a two pronged effort: • Implementation survey • Data dictionary of CORE preservation semantic units (= data elements) ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
A Brief History of PREMIS • PREMIS working group publications: • “Implementing Preservation Repositories for Digital Materials: Current Practice and Emerging Trends in the Cultural Heritage Community”, December 2004 • “PREMIS Data Dictionary for Preservation Metadata, version 1.0”, May 2005 ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
A Brief History of PREMIS • PREMIS Implementation • PREMISEditorial committee formed • Maintained by Library of Congress • “PREMIS Data Dictionary for Preservation Metadata, version 2.0”, March 2008 • Who uses? See implementation registry • PREMIS Implementors Group (PIG) listserv for practitioners ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
OBJECT RIGHTS EVENTS AGENTS Discrete unit of information in digital form Rights or permissions info associated with Object or Agent Important lifecycle events Parties to Events and/or Rights PREMIS Data Model for an “intellectual entity” ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
PREMIS Data Model ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
More about PREMIS Object • Is an abstraction, meant to cluster semantic units and clarify relationships • Has 3 subtypes: • File – the usual suspect • Bitstream – contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes • Representation -- set of files, including structural metadata, needed for a complete and reasonable rendition of an Intellectual Entity. ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
Assumptions underlying PREMIS • Not about “descriptive” metadata (used for search & discovery) • Not about “technical” metadata (usually about the format(s) of the component files or bitstreams) • These areas to be covered by domain specific metadata, e.g., FGDC or ISO profiles • Mind the Gap! ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
Simple Example of use of PREMIS Object Data Elements • Applied at file level • Automatic insertion by Ingest code to retain important provenance info for each file before moving into the preservation repository • Original file name from data provider • Original checksum • Original file size ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
PREMIS Object Excerpt (v1.1) ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
More about PREMIS Object relationships • Defined as associations b/w two or more: • Object entities or • Entities of different types, e.g., an Object & an Agent. • Recorded for long term preservation purposes • Typical relationship types = structural (component of representation), derivative (format varieties), dependent (required schema or database structure) • Could be expressed using other schemas for packaging the resource such as METS or XFDU or MPEG DIDL ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
Use of PREMIS Rights data elements • Applied at representation level • Reference to donor’s Deposit Agreement (using METS) • Key info from the ingested Deposit Agreement for immediate playback ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
PREMIS Rights Excerpt (v1.1) ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
Event 1: Transform of descriptive MD from MS Access db => XML => MODS Applied at representation level Why this event? In case of questions from outside data provider Retain singular scripts & transform mechanisms Test practicability of recording such events in production environment Use of PREMIS Event for simple event ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
PREMIS Event Excerpt (v1.1) ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
Dataset 1: official street centerline file used by emergency services to locate street addresses Dataset 2: aspects of the road network including topography, angles & geometry of the road network used for a tourist map Another example: GIS Dataset: Street network of given metropolitan area Event to be documented: Merge c:\temp\states1;c:\temp \states2; c:\temp\USA ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
Want to describe full process of data creation Includes “merge” and data sources Advantage of PREMIS – can describe events once in repository Why this event? Important to describe processes during different phases of lifecycle, even prior to ingestion Use of PREMIS Event Data Elements ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
For data management within the repository Audit trail for descriptive MD Version of Ingest code? Data provider who created / altered the resource or the metadata, e.g., USGS which added FGDC MD to HRO from Monterey Bay Water Resource Use of PREMIS Agent Data Elements ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
PREMIS & Geospatial data -- Comments based on experiences: • Works well when: • Domain specific MD exists, e.g., FGDC for descriptive and technical MD • There are levels of the resource with MD to be associated, e.g., at representation & file(s) level • Need to document various points in the lifecycle of the data ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
PREMIS & Geospatial data -- Comments based on experiences: • In earlier versions of PREMIS unclear how to document: • Context • Environment including at time of creation • “Significant properties” • Existence of geospatial format registries ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
PREMIS v 2.0 more flexible • Still XML binding • Allows for containers • Allows hierarchical relationships • Extensible by use of new <premis:extension> element to insert other elements, XML fragments, e.g., technical MD, provenance metadata, etc. • Board considering the inclusion of mechanism used by packaging schemas to “wrap” or “reference” other metadata ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
For more detail, see “An Investigation into Archiving Geospatial data Formats “ prepared for NGDA Project, funded by NDIIPP (http://www.ngda.org/research.php) Formats examined Approaches of FGDC, PREMIS, and Center for International Earth Science Information Network (CIESIN)‘s Geospatial Electronic Record (GER) model on basis of: Environment/ computer platform Semantic underpinnings domain specific terminology provenance data quality appropriate use PREMIS & Complex Geospatial Data ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
Examples of Geospatial “Context” • Placing dataset in Time & Space • Semantic underpinnings, e.g., • Abstract • Description of purpose / research methodology • Intended use of data to avoid misinterpretation or misuse • Where to put? • FGDC has place • PREMIS would not necessarily consider this as “preservation” metadata, but rather “descriptive” or technical MD, however see v 2.0 ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
Examples of “Environment” and/or “Significant properties” for geospatial data • HW info pertinent at time of data creation • SW info pertinent at time of data creation (?) • Lineage or “provenance” data e.g., to communicate processing steps used to create scientific data product • Events, parameters & source data which influenced or impacted the creation of the data set prior to its ingestion into the archive in order to full understand the data that you’re getting ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
“Environment” & “Significant properties”, continued… • Data Quality – describing completeness, logical consistency, attribute accuracy • Data Trustworthiness – data creator / provider reliable? = “authentic” • Data Provenance – processes & sources for dataset = “understandable & reliable” • Understanding of the specific needs of the “designated community” ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009
Questions? / comments? Nancy J. Hoebelheinrich njhoebel@gmail.com ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, 2009