240 likes | 420 Views
Versioning of Digital Objects in a Fedora-based Repository. Matthias Razum FIZ Karlsruhe DORSDL Workshop Alicante September 21, 2006. Outline. Motivation Versioning Concepts in eSciDoc Content Models Technical Approach Conclusion. Project Setup and Mission.
E N D
Versioning of Digital Objects in a Fedora-based Repository Matthias RazumFIZ Karlsruhe DORSDL WorkshopAlicanteSeptember 21, 2006
Outline • Motivation • Versioning Concepts in eSciDoc • Content Models • Technical Approach • Conclusion
Project Setup and Mission • eSciDoc is a joint project of the Max-Planck-Society (MPS) and FIZ Karlsruhe • 6 million € five-year grant (2004 – 2009) from the German Federal Ministry of Education and Research • It aims to build an integrated information, communication and publishing platform for web-based scientific work, exemplarily demonstrated for multi-disciplinary applications in the MPS • eSciDoc is not a mere research project, but aims at establishing an innovative productive system
Repositories for eScience • The contents of an institutional repository or a digital library form the ‘institutional memory’ of an organization • And just like human memory, they should allow for associating information objects in novel contexts, thus creating new scholarship • Interdisciplinary work is becoming increasingly important, so systems have to span scientific disciplines • Repositories should be open, application-independent and flexible, thus laying the ground today for repurposing the information in future applications
Turning Static Objects into ‘Living’ Knowledge • e-Scholarship allows to publish all intermediate results of knowledge generation from first ideas, theories, discussions with peers to final results • Institutional Repositories and Digital Libraries need to support scholars already in the early steps of this process, thus enabling their users to share their work in progress with peers • Thinking a step further leads to interactive authoring environments with support for collaboration and annotations • As a result, objects loose their static nature and become ‘active nodes’ in a network of knowledge
Implications • The concept of ‘ownership’ of an artifact is loosened and partly replaced by an ongoing authoring process which spans persons, places, and time • Collaborative authoring raises an issue familiar to software developers: versioning of digital objects • All intermediate or working versions of artifacts should become part of the repository, not just the final versions • Good Scientific Practice requires provenance data for objects and versioning
Outline • Motivation • Versioning Concepts in eSciDoc • Content Models • Technical Approach • Conclusion
Versioning on Object Level • Fedora’s basic object model – as defined in FOXML – is composed of an identifier, some key descriptive properties and a set of datastreams • Currently, each change to a datastream leads to a new version of the datastream, but not of the object itself. • On the other hand, authors and editors perceive objects as one coherent entity, not as a set of datastreams. • They request a ‘whole-object’ versioning which complies with their mental model.
Fixed and Floating Object References • Scholarly work strongly relies on citations and external references to existing material (e.g. primary data and supplementary material) • In the context of digital repositories, these associations are expressed as object relations. • Versioning of objects then raises the question how to handle relations pointing to a versioned object. • eSciDoc implements two approaches: fixed relations pointing exactly to a given version of an object and floating relations which always point to the latest version of an object.
Internal and Public Versions • Versions represent intermediate work statuses and are only visible to authors of digital objects • Revisions are published versions of objects with persistent identifiers. • Creating a revision is an intellectual step which most often includes some form of quality assurance, whereas versioning is an automated process.
Container Objects • eSciDoc allows the grouping of objects by means of container objects like collections or bundles. • A change to one of the contained objects substantially changes the container object as well. Therefore, any change to a contained object should lead to a new version of the container object. • The same applies to revisioning: container objects are citable objects with their own persistent identifier. Revisioning of contained objects forces a new revision of the container object too.
Outline • Motivation • Versioning Concepts in eSciDoc • Content Models • Technical Approach • Conclusion
Content Models in General • An important part of implementing a Fedora repository is modeling different classes or “genre” of digital object that will be created, stored, and managed in the repository. • A content model will typically describe the following: • Datastream composition • the number and kinds of datastreams that must be present in the digital object • the format(s) for those datastreams, either MIME or format identifiers • whether each kind of datastream is required or optional • whether each kind of datastream has cardinality contraints • Semantic identifiers for each kind of datastream relationships • in the cases where a content model is a “graph” of related content models • Disseminators (optional)
Essential Properties hasProperties 1 hasDefaultMD 1 eSciDoc Metadata hasRevision hasMD * * Metadata hasComponent * hasLicense Content Component License * hasMD hasLicense * 1 CC License CC Metadata Structural View of Content Item Content Item
Content Item Modeled as Fedora Object hasComponent * Content Item Content Component RELS-EXT RELS-EXT eSciDoc MD CC MD MD1 License1 ... ... MDn Licensen WOV MD Content Stream
Container Modeled as Fedora Object hasMember * Container Content Item RELS-EXT RELS-EXT eSciDoc MD eSciDoc MD MD1 MD1 ... ... MDn MDn Structure Map WOV MD WOV MD
Outline • Motivation • Versioning Concepts in eSciDoc • Content Models • Technical Approach • Conclusion
Whole-Object Versioning Metadata • Fedora versioning works automatically within objects • The eSciDoc middleware keeps track of whole object versions via objectVersion metadata • The eSciDoc middleware also can tag particular whole object versions as “revisions” which will be official published views of the object
Animated View Revision t0 t1 t2 t3 t4 PID: parent:1 VersionID: 1.0 DOI: -- PID: parent:1 VersionID: 1.1 DOI: -- PID: parent:1 VersionID: 1.2 DOI: -- PID: parent:1 VersionID: 1.3 DOI: x.y/rev:1 PID: parent:1 VersionID: 1.4 DOI: -- Content Item CC1 PID: child:1 Version: t0 PID: child:1 Version: t0 PID: child:1 Version: t0 PID: child:1 Version: t0 PID: child:1 Version: t4 CC2 PID: child:2 Version: t0 PID: child:2 Version: t1 PID: child:2 Version: t1 PID: child:2 Version: t1 PID: child:2 Version: t1 CC3 PID: child:3 Version: t2 PID: child:3 Version: t2 PID: child:3 Version: t2
Object Version XML <objectVersion versionID=”1.0”> <comment> this is the first whole object version </comment> <component PID=”child:5” dateTime=”2006-05-10T12:21:57Z”/> <component PID=”child:6” dateTime=”2006-05-10T12:21:57Z”/> </objectVersion> <objectVersion versionID=”1.1” revisionID=”doi:10.11.1234”> <comment>demo:5 is the same; demo:6 modified; demo:7 ingested </comment> <component PID=”child:5” dateTime=”2006-05-10T12:21:57Z”/> <component PID=”child:6” dateTime=”2006-08-11T09:23:09Z”/> <component PID=”child:7” dateTime=”2006-08-11T09:23:09Z”/> </objectVersion>
Outline • Motivation • Versioning Concepts in eSciDoc • Content Models • Technical Approach • Conclusion
Conclusion • Versioning is essential for repositories which cover the whole object lifecycle • Fedora already comes with a powerful versioning mechanism, but cannot fulfill all requirements of eSciDoc • Atomistic content models make versioning even more complex • The proposed approach provides a solution for advanced versioning requirement and at the same time is a demonstration of Fedora’s flexibility and adaptability
Acknowledgements The concepts in this presentation are based on • eSciDoc’s Logical Data Model, created by Natasa Bulatovic (ZIM, Max Planck Society) • a joint workshop of ZIM and FIZ with Sandy Payette and Carl Lagoze
Questionsmatthias.razum@fiz-karlsruhe.dewww.escidoc-project.de/homepage.htmlQuestionsmatthias.razum@fiz-karlsruhe.dewww.escidoc-project.de/homepage.html