190 likes | 332 Views
Global Digital Format Registry An Update. July 2006. Global Digital Format Registry. “The Global Digital Format Registry (GDFR) will provide sustainable services to collect, review, store, discover, and deliver significant representation information about digital formats.”
E N D
Global Digital Format RegistryAn Update July 2006
Global Digital Format Registry • “The Global Digital Format Registry (GDFR) will provide sustainable services to collect, review, store, discover, and deliver significant representation information about digital formats.” • Centrally-organized collection and review • Distributed storage, discovery, and delivery via a peer-to-peer network
The GDFR project • Harvard University Library (HUL) funded for 2 years by the Mellon Foundation • Staffing and technical work subcontracted by HUL to OCLC (June 2006) • Project oversight • Steering Committee (SC) for policy oversight • Technical Working Group (TWG) for technical oversight • Active solicitation of the international stakeholder community for review and comment
Deliverables • Functional requirements • Technical specifications • Implementation plan (technology platform) • Inter-nodal protocol • Reference software implementation for nodes • Released under LGPL • Editorial process • Initial population • Succession plan
Schedule • Month 1 Staffing, establish public web site • Months 2-6 Consultation, design, prototyping Public discussion planned for DLF Fall Forum, Boston, November 2006 • Months 7-12 Protocol, node implementation • Months 13-18 Initial population, inter-nodal testing • Months 19-24 Integration testing
What is a format? • “A serialization of an abstract information model” • A set of syntactic and semantic rules for mapping from an information model to a byte stream (and, in most instances, for mapping back) • Encompasses the nominal sense of “file format” as well as a range of conceptual models from the micro to the macro level • IEEE 754 floating point number … File system
GDFR network • Peer-to-peer network communicating over a common protocol • Structured delegation for distribution • DNS analogy • “Root” node • Top-level nodes • Distribution classes • Local data • Unvetted data • Vetted data
Representation Information • Identifiers • Responsibility • Classification • Relationships • Specifications • Signatures • Grammar • Tools • Assessment
Identifiers • Canonical and alias identifiers in a variety of naming systems • Common usage “TIFF” • MIME “image/tiff” • PRONOM PUID “fmt/10” • LC FDD “fdd000022” • Canonical GDFR-defined identifier in the “info” URI scheme
Responsibility • Creator • Owner • Maintenance agency and process • Legal conditions for use
Classification Ontological CLASSES, abstract families, concrete formats, and relationships BYTESTREAM IMAGE STILL RASTER GIF GIF87a GIF89a is-new-version-of GIF87a JPEG ISO 10918-1 JFIF is-subtype-of ISO 10918-1 TIFF TIFF 4.0 TIFF 5.0 is-new-version-of TIFF 4.0 TIFF 6.0 is-new-version-of TIFF 5.0 TIFF/IT is-subtype-of TIFF 6.0 TIFF/IT/CT is-subtype-of TIFF/IT TIFF/IT/CT/P1 is-subtype-of TIFF/IT/CT
Relationships • Subtype ASCII is-subtype-ofUTF-8 UTF-8 has-subtype ASCII • Version TIFF 6.0 is-version-ofTIFF 5.0 TIFF 5.0 has-version TIFF 6.0 • Encapsulation WAVE can-containμ-law μ-law is-contained-by WAVE • Affinity JPEG is-similar-to SPIFF SPIFF is-similar-to JPEG
Specifications • Bibliographic citation, including descriptive (e.g. ISBN) and actionable (e.g. (URI) identifiers • IP considerations probably prohibit the free distribution of specification documents
Signatures • External • Generally indicative • File extension(s) • Internal • Generally dispositive • Magic number • Other well-defined internal syntactic structures
Grammar • Formal notation of a format • Typed to permit multiple parallel formulations, e.g. BNF, ABNF, BSDL, DFDL, EAST • May be feasible only for relatively simple formats
Tools • Services, systems, and tools using formats as inputs or outputs • Described in terms of some functional taxonomy, e.g. edit, transform, render
Assessment • Format-specific risk assessment • Typed to permit multiple parallel formulations • LC Sustainability/Quality & Functionality (SQF) • OCLC INFORM • DSTC PANIC • Cornell Virtual Remote Control (VRC)
General development goals • First create a generalized registry framework, then specialize it for the GDFR application • To the extent that this does not effect other goals and schedules • Platform/network transport independent • Full information content of GDFR is expressible in XML form • GDFR network is re-instantiatable from its XML expression
Related Work • PRONOM www.nationalarchives.gov.uk/pronom/ • Representation Information Registry/Repository dev.dcc.ac.uk/twiki/bin/view/Main/DCCRegRepV04 • LC Digital Formats Web www.digitalpreservation.gov/formats/ • NARA GDFR governance investigation