200 likes | 294 Views
Unified Digital Format Registry (UDFR) Stakeholder Meeting. Library of Congress Washington, DC April 13, 14, 2011. Welcome!. Stephen Abrams, Associate director Lisa Colvin, UDFR project manager Alex Genadinik, UDFR project developer University of California Curation Center
E N D
Unified Digital Format Registry (UDFR)Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011
Welcome! Stephen Abrams, Associate director Lisa Colvin, UDFR project manager Alex Genadinik, UDFR project developer University of California Curation Center Bibliothèque nationale de France Library of Congress Data Conservancy / Johns Hopkins U Los Alamos National Laboratory DataONE / UC Santa Barbara National Archives [UK] Deutsche Nationalbibliothek National Archives [US] Ex Libris National Library of New Zealand Family Search New York University Florida Center for Library Automation Open Planets F / Nationaal Archief GDFR / Harvard University Tessella Georgia Institute of Technology University of Pennsylvania Government Printing Office [US] Virginia Institute of Technology Koniklijke Bibliotheek
Objectives The desired outcomes of this stakeholder meeting are: • Agreement on the scoping of functional and non-functional requirements • Agreement on the data modeling process and ontology • Agreement on key technology decisions • Agreement on project plan and schedule • Groundwork for the administrative and technical continuity of UDFR as an ongoing service
Key questions • What subset (or superset) of PRONOM and GDFR functionality and data modeling should be supported? • Is there a useful distinction between format “facts” and “policies”? • What are the criteria for contributor eligibility? • To what level of technical review should/will contributed information be subject , and by whom? Are new contributions immediately visible in an unreviewed state? • What is the appropriate granularity of provenance and review? • Should UDFR identifiers be transparent or opaque? • Should UDFR support static or dynamic inheritance of properties? • Must there be an explicit grant of license by content contributors? • What is the proper replication model: master/slave(s) or peer-to-peer? • Should UDFR support classes of information that is not replicated? • What are the criteria for node eligibility? • What is the ongoing relationship between PRONOM and UDFR?
Project background • Why worry about formats? Information preservation Bit preservation • Since formatted digital assets are inherently mediated by technology, they are particularly susceptible to disruptive technological change Format a set of syntactic and semantic rules for mapping between an information model and a serialized bit stream
Project background • PRONOM http://www.nationalarchives.gov.uk/PRONOM/Default.aspx • Global Digital Format Registry (GDFR) http://www.gdfr.info/ • Unified Digital Format Registry (UDFR) http://www.udfr.org/ • “The Unified Digital Format Registry (UDFR) will provide a reliable, sustainable and publicly accessible knowledge base of file format information” • Fully open source implementation that “unifies” the function and data holdings of PRONOM and GDFR
UDFR project 1 year, 2+ FTE, funded by the Library of Congress • Features • Use cases and functional requirements developed by the stakeholder community over the past two years • Support for linked data and semantic web • Support for a distributed network of independent but interoperable UDFR nodes • Deliverables • Working, documented, single-node registry system, initially populated with an export from PRONOM, GDFR, and other appropriate sources • BSD license
Community building How can we ensure the administrative and technical continuity of the UDFR once the LC-funded work is completed? • Policy and strategic planning • Operation of the initial registry node • Recruitment of additional nodes • Technical maintenance and enhancement of the code base • Content contribution • Review of contributed information
Policy and strategic planning What is the lightest weight governance structure that is effective? • Continue as an ad hoc group or develop a more formal organization? • Operate as loose consortium under an MOU • Look for an administrative umbrella under an existing organization
Operational considerations CDL is prepared to provide an operational home for the initial production node on an interim basis • Any long-term commitment may require some (minimal) level of cost recovery Additional replication nodes • Eligibility requirements? • Minimal/maximal number desired?
Technical maintenance and enhancement • Manage source code in a public code repository • Enhancement planning and prioritization • Call for community-wide evaluation at 6/12 months of production operation • Eligibility for contributors? Committers?
Content contribution • Contributor eligibility • Are contributors recruited or self-selected ? • What can we do to encourage contribution? • Engagement by institution and discipline
Technical review • Reviewer eligibility • Are reviewers recruited or self-nominated? • Single or multiple levels of scrutiny? • Standard criteria for evaluation • What is the appropriate level of due diligence?
Follow-up planning Next steps • Ongoing project work with early prototype releases • Production release (single node) in January 2012 • Governance, policy, and planning structure • Solicitation of replication nodes • Solicitation of content contribution • 6/12 month evaluation
Key questions … answered ? ! • What subset (or superset) of PRONOM and GDFR functionality and data modeling should be supported? • Is there a useful distinction between format “facts” and “policies”? • Priority for “facts”; support for “policies” as time permits. • What are the criteria for contributor eligibility? • No criteria, but user account required (i.e. no anonymous contribution). • To what level of technical review should/will contributed information be subject , and by whom? Are new contributions immediately visible in an unreviewed state? • Opportunity (but not a requirement) for review. Strong provenance will be maintained, as well as explicit tagging indicating the level of review. • What is the appropriate granularity of provenance and review? • Individual assertion.
Key questions … answered ? ! • Should UDFR identifiers be transparent or opaque? • Opaque, and without a node identifier component (to avoid the co-reference problem). • Should UDFR support static or dynamic inheritance of properties? • Not clear if inheritance is a feature of the model, the query system, or the UI. • Must there be an explicit grant of license by content contributors? • Yes, ideally using CC0. • What is the proper replication model: master/slave(s) or peer-to-peer? • Master/slave(s), but replication is not the highest immediate priority. However, nothing in the design or implementation of the registry should preclude adding support for replication in the future.
Key questions … answered ? ! • Should UDFR support classes of information that is not replicated? • Need to deal gracefully with legally encumbered information. In a master/slave configuration, data entered at a slave node would remain local. • What are the criteria for node eligibility? • With no consensus on the immediate need for replication, this question does not require an immediate answer. Some identified criteria include: geographic dispersion and high-availability operation. • What is the ongoing relationship between PRONOM and UDFR? • Continued close consultation and collaboration.
Thank you! • http://www.udfr.org/ • Safe travels!