1 / 67

Sandy Payette Cornell Information Science

The Mellon-Funded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002. Sandy Payette Cornell Information Science. Motivation. The Problem of Complex Content. Some familiar objects. Digital Library Content not just documents. Complex, compound, dynamic objects.

Download Presentation

Sandy Payette Cornell Information Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Mellon-Funded Fedora ProjectA Briefing for the Los Alamos National LaboratoryAugust 26, 2002 Sandy Payette Cornell Information Science

  2. Motivation The Problem of Complex Content

  3. Some familiar objects Digital Library Contentnot just documents ... • Complex, compound, dynamic objects

  4. Key Research Questions • How can clients interact with heterogeneous collections of complex objects in a simple and interoperable manner? • How can complex objects be designed to be both generic and genre-specific at the same time? • How can we hide the complexity of an object’s underlying data structures and relationships from clients? • How can we associate services and tools with objects to provide different presentations or transformations of the object content? • How can we associate specialized, fine-grained access control policies with specific objects, or with groups of objects?

  5. The Flexible Extensible Digital Object Repository Architecture (FEDORA) • Developed as a DARPA and NSF-funded research project at Cornell (1997-present) • CORBA-based reference implementation • Extensive interoperability testing • Policy Enforcement • Interpreted and re-implemented at University of Virginia (1999) • Simple web-oriented implementation, focused on access to collections • Java servlet and relational db • Virginia prototype supported testbed of 10,000,000 digital objects with very good results (1999-2001) • Andrew W. Mellon Foundation granted Virginia and Cornell $1,000,000 to develop a full-featured production FEDORA system that that is web-based (2002+)

  6. Flexibility – object model that fits many different contexts Management - of distributed digital content and services Access–stable interfaces to digital objects; behavior-centric Interoperability – among digital objects and repositories Extensibility – easy evolution of object behaviors Security –rights management and access control Preservation– of content, plus “look and feel” FEDORAOriginal Research Goals

  7. Model for Collaboration Digital Library Research and Real Library Requirements • University of Virginia developing extensive digital collections since 1992 • Virginia Digital Library R&D Group chartered with finding solution for integration • Formal Requirements analysis • Search for commercial products • Discovery: Cornell research parallels stated requirements

  8. Virginia Requirements:Heterogeneous Digital Collections

  9. Virginia Requirements:Managing the Collections • Scalability to support hundred of millions of objects • Persistent unique names for all resources without respect to machine address • Support inter-relationships among objects • Manage the digital resources and metadata, as well as computer programs, services and tools that support them • Enforce appropriate policies for use of Library resources • Provide a high level of security • Support preservation activities appropriately

  10. Virginia Requirements:Delivering the Collections • Well-architected, flexible relationships between services/tools and digital content • Digital objects, themselves, have ability to provide users with an appropriate launch-pad or tool to use the object content • Every resource can be used in any number of contexts • Move towards a digital library that is configurable by an “aware” user • Provide resource discovery (searching) across the full collection • Deep searching in particular collections

  11. Shortcomings of commercial digital library products • Narrow focus on specific media formats (e.g. image databases, document management) • Fail to effectively address interrelationships among digital entities • Fail to address interoperability; no open interfaces to facilitate sharing of services; no standard protocols for cross-system interoperability • Fail to provide facilities for managing programs and tools that are integral to delivering digital content. • Not extensible; does not enable easy integration of new tools and services

  12. The Fedora Architecture Overview of Basic Model

  13. Digital Object Containerfor aggregating any digital content Content disseminations based on behavior definitions Extensibility of behavior mechanisms Repository Service layer for “contained” Digital Objects Object lifecycle management Access management FEDORA Basic Architectural Abstractions

  14. FEDORA Digital Object Globally unique persistent id Persistent ID ( PID ) Public view: access methods for obtaining “disseminations” of digital object content Disseminators Internal view: metadata necessary to manage the object System Metadata Datastreams Protected view: content that makes up the “basis” of the object

  15. FEDORA Digital Object Architecture Behavior Definition Object Data Object Persistent ID (PID) Persistent ID ( PID ) System Metadata Datastreams Disseminators Service Definition Metadata Behavior Mechanism Object System Metadata Persistent ID (PID) Datastreams System Metadata Datastreams Service Binding Metadata

  16. Data Object Association to External Behavior Service

  17. Digital Object InteroperabilityCommon Behaviors for Variable Content Functional equivalency

  18. Book Photo Collection Digital Object ExtensibilityAdding New Behaviors Digital Object The same underlying content... to create new disseminations not originally conceived of can be operated on in novel ways…

  19. Virginia Prototype Content Models and Fedora Demos

  20. General Image Content Model Persistent ID ( PID ) Disseminators Behavior Behavior Disseminator Definition Mechanism web_ image1 web_image web_ image1 get_thumb HTTP GET get_ med imagedisplay.java get_high HTTP GET get_ veryhigh HTTP GET web_default_image web_default web_default_image Metadata get_as_page imagedisplay.java get_in_context HTTP GET (thumb) System Metadata admin Administrative metadata desc Descriptive metadata Datastreams basis1 pointer to thumbnail size image basis2 pointer to medium resolution image basis3 pointer to high resolution image basis4 pointer to highest resolution image (Mycenae image example)

  21. MrSID Image Content Model Persistent ID ( PID ) Disseminators Behavior Behavior Disseminator Mechanism Definition web_image_ mrsid web_image web_image_ mrsid get_thumb get_ image.pl get_ med get_ image.pl get_high get_ image.pl get_ veryhigh get_ image.pl web_default_image web_default web_default_image get_as_page get_ image.pl get_in_context get_ image.pl Metadata System Metadata admin Administrative metadata desc Descriptive metadata Datastreams basis1 pointer to MrSID formatted image (Pavilion III image example)

  22. Finding Aid Content Model Persistent ID ( PID ) Disseminators Behavior Behavior Disseminator Definition Mechanism web_ ead1 web_ ead web_ ead1 get_web_default eaddoc.java get_ tp tp.xsl get_ admin admin.xsl get_summary summary.xsl get_ scopecontent scopecontent.xsl get_ bioghist bioghist.xsl get_component component.xsl get_arrangement arrangement.xsl get_organization organization.xsl get_document document.xsl get_menu menu.xsl web_default_ ead1 web_default web_default_ ead1 get_as_page eaddoc.java get_in_context document.xsl System Metadata admin Administrative metadata desc Descriptive metadata Datastreams basis1 pointer to XML Finding Aid source (Finding Aid example)

  23. TEI Letter Content Model Persistent ID ( PID ) Disseminators Behavior Behavior Disseminator Definition Mechanism web_ teiletter1 web_ teiletter web_ teiletter1 get_ teiletter _default teiletterdoc.pl get_original letter.header.xsl get_modern modern.xsl get_ teiheader teiheader.xsl get_ pageimages pageimages.xsl Metadata web_default web_default_ teiletter web_default_ teiletter get_as_page teiletterdoc.pl get_in_context letter.header.xsl System Metadata admin Administrative metadata desc Descriptive metadata Datastreams Datastream (s) basis1 pointer to XML TEI letter source (TEI letter example)

  24. TEI Book Content Model Persistent ID ( PID ) Disseminators Behavior Behavior Disseminator Definition Mechanism web_ teibook1 web_ teibook web_ teibook1 get_web_default teidoc.java get_ teiheader admin.xsl get_ toc contents.xsl get_menu_ teibook menu.xsl get_ tp _ teibook tp.xsl get_id id.xsl web_default_ teibook web_default web_default_ teibook Metadata get_as_page teidoc.java get_in_context contents.xsl System Metadata admin Administrative metadata desc Descriptive metadata Datastreams basis1 pointer to XML TEI book source (TEI book example)

  25. GDMS Content Model Persistent ID ( PID ) Disseminators Behavior Behavior Disseminator Mechanism Definition web_ gdms2 web_ gdms web_ gdms2 get_web_default imagedef.java get_ gdmswalk gdmswalk.xsl get_menu imagemenu.xsl web_default_ gdms web_default web_default_ gdms get_as_page imagedef.java get_in_context HTTP GET System Metadata admin Administrative metadata desc Descriptive metadata Datastreams Metadata Datastream basis1 pointer to XML GDMS source file (Mycenae example) (lawn example)

  26. Numerical Data Content Model (ICPSR survey example)

  27. The New FEDORA Technical Specifications – Part I

  28. Background Material Overview of Web Service Technologies

  29. What is a Web Service? • A distributed application that runs over the internet. • An addressable network endpoint which receives structured messages returns structured responses. • A web application that publishes an open interface through which clients can send requests and received responses.

  30. How is this different from plain old web applications? • Formally defined API (application programming interface) defines a set of abstract operations for a web service • Published bindings for client to run operations • Standard protocol for invoking operations on the service. • XML as standard means of encoding service requests and responses.

  31. Why are Web Services important? • Interoperability • Web applications can interact and build upon each other • Data is transferred in an interoperable manner (e.g., over HTTP) • Data is encoded in an interoperable format (XML) • Works in decentralized, distributed, operating-system independent environment. • Standards-oriented • Means to expose complex operations with rich data typing (via XML Schema language typing) • Ease of integrating distributed systems via the Web • W3C effort to develop this service architecture

  32. How are Web Services Implemented? • The Simple Object Access Protocol (SOAP) Approach • SOAP is a messaging protocol that can run over different transport protocols (e.g., HTTP, SMTP) • Operation oriented (send a request to a end point) • Like CORBA, RMI, DCOM…but for Web and simpler • Application APIs can be defined and published using the Web Service Description Language (WSDL) • Requests and responses sent as XML messages • Supports simple and complex data typing in requests and responses • Supports transmission of binary data within requests or response packages

  33. How are Web Services Implemented? • The REST (Representational State Transfer) Approach • URI + HTTP + XML • URI/resource driven; message built into a URI (URL) • HTTP GET or POST • Response is XML data • Issues: • Not a standard, but a style of doing web apps; arguably it just gives a fancy name to how lots of people do applications on the web by default; nothing really new here; just argues to do things the way we have been, maybe a little more standard by using XML. • Fragile service definition – URL’s change • No data typing on requests • Limited ability to transmit complex requests on URL • W3C behind SOAP, but only one strong voice out there for REST (Prescod).

  34. Example of Web Service using SOAP My Application SOAP Request (XML) Google Web Service SOAP/HTTP SOAP/HTTP doSpellingSuggestion(payet) payette SOAP Response (XML)

  35. XML SOAP Request <?xml version="1.0" encoding="UTF-8"?> SOAP-ENV:Envelope xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/ xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <SOAP-ENV:Body> <m:doSpellingSuggestion xmlns:m="urn:GoogleSearch"> <key>/e325JlNPASJu</key> <phrase>payet</phrase> </m:doSpellingSuggestion> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

  36. XML SOAP Response <?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <SOAP-ENV:Body> <ns1:doSpellingSuggestionResponsexmlns:ns1="urn:GoogleSearch" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <return xsi:type="xsd:string">payette</return> </ns1:doSpellingSuggestionResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

  37. New Fedora: Key Features • Repository system exposed as two related Web services • described using WSDL • both SOAP and HTTP bindings • Digital objects encoded and stored as XML using Metadata Encoding and Transmission Standard (METS) • Digital object behaviors implemented as linkages to distributed web services (also described using WSDL) • Digital objects support versioning of both content and services.

  38. New Fedora System

  39. Web Service Communication View

  40. The New FEDORA Encoding Digital Objects in XML

  41. Metadata Encoding and Transmission Standard (METS) • XML “standard” for encoding descriptive, administrative, and structural metadata of digital library objects • Developed under auspices of the Digital Library Federation • METS standard maintained by the Network Development and MARC Standards Office of the Library of Congress http://www.loc.gov/standards/mets/

  42. METS Schema • METS is written in the XML Schema Language • METS defines four sections for an object • Descriptive metadata • Administrative metadata • File group • Structure map • METS goals include: • Facilitate management of objects within a repository • Provide a standard format for exchange of objects between repositories • Provide standard format for transmission of objects to users for rendering (via tools or applications)

  43. Mapping Fedora to METS

  44. Mapping Fedora to METS

  45. Digital Object Versioning • Versioning within Data Objects • Datastream versioning • Date/time stamped • New version every time datastream is modified • Disseminator versioning • Date/time stamped • New version if disseminator is modified to reference a different Behavior Mechanism (“better mousetrap”) • Versioning within Behavior Definition and Mechanism Objects • New versions of WSDL metadata recorded in these objects (with date/time stamps) • This deserves much more explanation that this slide can offer! 

  46. METS : Sample Fedora Object Click here for image digital object

  47. Fedora Dissemination Database • Alternate form of object storage that will act as a cacheof most recent versions of digital objects • Ensure high-performance access (disseminations) • Repository system replicates from authoritative XML version of objects to relational database • Plan to phase-out the database in Phase 2-3: • Access sub-system to work completely off the XML storage, as XML tools improve performance-wise. • Pursue different caching strategies as necessary

  48. The New FEDORA Repository System Design

  49. Fedora Repository System

  50. FEDORA Web Service API Definitions • “API-M” – interface for management sub-system • Operations necessary to create and maintain objects and their components • Interface directly with authoritative XML version of object • “API-A” – interface for access sub-system • Operations necessary for clients to perform disseminations on objects in the repository • No direct access to object internal structure or components • Will work against cached representation of object to optimize performance.

More Related