180 likes | 266 Views
Interoperability of enhanced publications: the DRIVER Tech Watch Report. SUETR Interoperability Workshop Tues, Dec 9 th , 2008 Karen Van Godtsenhoven, UGENT,. Context: DRIVER-II Technology Watch.
E N D
Interoperability of enhanced publications: the DRIVER Tech Watch Report SUETR Interoperability Workshop Tues, Dec 9th, 2008 Karen Van Godtsenhoven, UGENT,
Context: DRIVER-II Technology Watch • DRIVER-II project: create EU repository infrastructure, create services on top, deliver software (D-NET), streamline developments (DRIVER guidelines, validator), support repository managers (helpdesk and mentor service) and raise awareness (Open access) • DRIVER-II focuses on services and demonstrators for enhanced publications • Ep’s: can contain many (all kinds of) data formats, but within DRIVER-II, basis: textual element • Dicovery workpackage: create object model for ep’s, demonstrator, Technology Watch report • Long Term Preservation, GRID computing, CRIS systems and
Interoperability of enhanced publications (Russell, Vanderfeesten, Hochstenbach, Van Godtsenhoven) • Interoperability in DRIVER context: exchange and dissemination of ep’s as complex, compound objects • Interoperability chapter focuses on five types of structural metadata (the relationship of the files within the objects) , NOT on ingest or descriptive metadata (eg SWORD, Bag-it) • For every type, a theoretical description and applications (case studies) are given, as well as an evaluation in the light of DRIVER.
Five formats for dissemination of ep’s • 1. Envelope models or packaging formats: METS, MPEG 21-DIDL, LOM/IMS-CP, ODF, OOXML,... • 2. Overlays, maps, feeds: RDF, SWAP, POWDER, ORE • 3. Embedding formats: RDFa, Microformats • 4. New/Old publishing formats: ODF, OOXML, CML, XHTML • 5. Web services: OKI (SOA), Gdata (ROA)
1. Envelopes • These formats provide access to the metadata, structural data, identifiers, and binary streams of publications all in one package (= envelope) . • 1. MPEG 21-DIDL in DARE context • 2. METS • 3. IMS – CP • 4. ODF packages • 5. OOXML/ Package convention • 6. Open e-book package
Envelopes, II • Comparison: table with all features • Outcome: All Package formats are useful for representing an Enhanced Publication as a Dissemination Information Package. Most of these results are gained through the ability to create different relationships among the different parts. This gives DRIVER the opportunity to harvest enhanced publications packaged in different formats used by different user communities. On an aggregated level, where all sources are harvested, it is possible to create relational maps between all sub-parts of the enhanced publications.
2. Overlays, Maps, Feeds • These formats provide an overlay on top of an existing network of internet resources. They tend to group references to resources, identify them and describe the content, structure and relations of all parts • 1. SWAP • 2. ORE • 3. POWDER
2.1 SWAP • A Dublin Core Profile for describing scholarly works • Designed to offer solution to range of interoperability issues that arise when using simple DC • Supports provision of richer & more consistent metadata • Plus, eg version control, identification of full text • Based on FRBR; uses DCMI Abstract Model/description sets • Hierarchical model could be suitable for DRIVER enhanced publications
However… • Despite much enthusiasm and support for the SWAP concept, no ‘proper’ implementation… • Requires commitment/resources to implement (people too busy trying to do the basics…) • Repository software developers need to implement first (currently have export plug-ins only) • Too complex? (FRBR…) • SWAP-Lite needed?! • DRIVER as aggregator – wait and see if uptake happens...
SWAP case studies (partial implementation) • University of Warwick • configured ePrints themselves to take SWAP records (not an easy task) • some problems encountered viewed in the community as being caused by SWAP (not the case eg Refs) • CLADDIER project • used for citations – selected small no. SWAP fields • limited application
2.2 OAI- ORE • Version 1.0 just released • The collection of resources that make up a scholarly publication is called an Aggregation, consisting of Aggregated Resources. In order to instantiate, describe and identify Aggregations, OAI-ORE defines Resource Maps which provide also information about the context in which an Aggregation was defined. • OAI-ORE suggests many published models for ORE documents using Atom, RDF/XML, OAI-PMH, and RDFa. • Case studies include SCOPE, TheoREM project, experiments at Urbana-Champaign, ORE serialization of objects based on Fedora model, functional ORE • DRIVER and ORE need to exchange views and ensure interoperability since ORE is a major player in the repository world.
ORE: an Aggregation containing three Aggregated Resources described by a Resource Map
2.3. POWDER • POWDER, or Protocol for Web Description Resources, W3C working draft : description of a group of resources through the publication of machine-readable metadata documents. Groups of resources (=aggregations) can be described as a whole by enumerating the individual items , or matching URI’s against descriptions of the URI’s schemes used • Use case: Trustmarks and verification (Online safety) • POWDER allows to write about many resources at once. (vs ORE: inverse scenario, in ORE, one looks at an aggregation and wants to know the resources & their properties in the aggregation. In POWDER, you want to know to which aggregation a resource belongs and learn about it through aggregation. Hence POWDER is able to describe multiple things at once)
3. Embedding [ </p>] • Existing resources are ‘beautified’/enriched by adding semantic annotations. Hence, the PDF link is embedded in splash page with special code highlighting its location. • Microformats community (W3C): widely used (Yahoo, Flickr) • Case study: Zotero (Urbana-Champaign), unAPI (clipboard like content copy across sites and browsers) • Microformats could enable collecting references whilst working with Driver. • Easy to harvest from, machine-readable html annotations
4. New Publishing formats • ODF (26300:2006) versus OOXML (ISO 29 500-1:2008) • file format ISO standards for saving & exchanging office documents (alternative to proprietary formats eg doc, ppt) • open up access to structured content which can be reused by other services eg DRIVER • controversy surrounding development of OOXML eg Microsoft chose not to support existing ODF standard • CML – disciplinary application of chemical structures • Plus: disciplinary xml types, structured and crawlable data
5. Web services • Web services: DRIVER needs to add API’s (in addition to OAI-PMH) on top of digital repositories to answer questions from agents on the content of your collections. • Very large world, web services: two main approaches: ROA and SOA (DRIVER combines them: ROA external and SOA internal) • Case studies: Gdata (ROA), Open Knowlegde Initiative (SOA) • Outcome: widely used in research as well as e-learning community (OKI). DRIVER should follow up on developments and try to stay interoperable (Gdata).
Credits • Karen Van Godtsenhoven • University of Ghent • Karen.VanGodtsenhoven@ugent.be • Mikael Karstens Elbaek, DTU • Gert Schmeltz Pedersen, DTU • Barbara Sierman, KB • Maurice Vanderfeesten, SURF • Rosemary Russell, UKOLN
DRIVER II Project • http://www.driver-community.eu/ • Helpdesk: http://helpdesk.driver.research-infrastructures.eu • Mentor service: http://www.driver-support.eu/forms/contactsform.php?la=en • Supported by European Commission • Available for re-use -