130 likes | 152 Views
Explore the challenges in preserving digital data, the importance of data standards, and tools for long-term retention. Learn about sustaining digital information, sustainability metrics, and access scenarios. Discover how ontologies can aid in classification, policy evaluation, and sustainability measurement.
E N D
Standards for Long-Term Retention of Digital Information:Can Ontologies Help? Joshua Lubell National Institute of Standards and Technology lubell@nist.gov Collaborative Expedition Workshop National Science Foundation July 18, 2007
The Problem • Too much digital data! • It takes about 15 minutes for the world to churn out new digital information equivalent to the entire collection in US Library of Congress • Proprietary file formats • Expected lifetime of typical manufacturing software application only 3 years • Short-lived Computing hardware and software • Expected lifetime of today’s storage/retrieval technologies only 10 years • Products often outlive computer software/hardware by an order of magnitude • Aircraft can last 50 years or more • Healthcare records should be preserved through the patient’s lifetime, and perhaps beyond • Methods/tools address preservation, but not reuse or re-engineering requirements
Data Standards • Necessary to avoid being locked into a vendor format or application that could disappear in the near future • Likely to be more stable than proprietary tools/formats • But data standards are only part of the solution • Information is more than just data!
DataObject InformationObject RepresentationInformation(metadata) Information = Data + Interpretation from Reference Model for an Open Archival Information System (ISO 14721:2003) Binary File Electronic Tech Manual Definition of PDF Format
InformationObjects ContentInformation PreservationDescriptionInformation Sub-categories • Reference • Provenance • Context • Fixity An Information Package
Tools for Tackling Long-Term Retention • Standards for representing digital artifacts • STEP – ISO 10303 (product data) • XML (documents) • Graphics, audio, video, multimedia standards • Scientific modeling standards • Methods for representing preservation information • Digital object typing/packaging • METS (Metadata Encoding and Transmission Standard) • MPEG-21 • DOPs (Digital Object Prototypes) • Ontology languages • Rules languages • Schematron (ISO 19757-3:2006) • Digital format registries (UK Archives, Harvard, Univ. of Maryland)
Sustaining Digital Information What is sustainability? From The Free Dictionary: • Noun - the act of sustaining life by food or providing a means of subsistence; "they were in want of sustenance"; "fishing was their main sustainment“ • Transitive verb • 1. To keep in existence; maintain. • 2. To supply with necessities or nourishment; provide for. • 3. To support from below; keep from falling or sinking; prop. • 4. To support the spirits, vitality, or resolution of; encourage. • 5. To bear up under; withstand: can't sustain the blistering heat. • 6. To experience or suffer: sustained a fatal injury. • 7. To affirm the validity of: The judge has sustained the prosecutor's objection. • 8. To prove or corroborate; confirm. • 9. To keep up (a joke or assumed role, for example) competently.
Sustaining Digital Information • Minimal • “Prop up” • Prevent destruction • Better • Preserve • Ensure authenticity, availability • Ideal • Nurture • “Care and feeding” • Enable reuse
Sustainability Metrics • Library of Congress digital format sustainability factors • Disclosure • Adoption • Transparency • Self-documentation • External dependencies • Impact of patents • Technical protection mechanisms • What are the sustainability factors for an archiving and/or records management strategy?
Access Scenarios: The Three Rs • Reference • Preserve information in its original state • Example (product data engineering): 3D visualization • Reuse • Allow for future modification, re-engineering • Example: ISO 10303-203:1994 (STEP AP203) • Rationale • Encode construction history, design intent, tolerancing info, lifecycle management info, etc. • Example: STEP AP203 ed.2 ++ • Ontologies and/or other representations needed
So How Can Ontologies Help? • Digital object type classification • Prediction of records management policy consequences • Evaluating a records management system based on sustainability criteria • Tailoring repository access according to the Three Rs • Measure long-term sustainability based on the Three Rs