400 likes | 409 Views
Learn about the Flexible Extensible Digital Object Repository Architecture (FEDORA) project, its development, managing collections, delivering content, supporting digital scholarship, and its advantages over commercial digital library products.
E N D
The Mellon-Funded Fedora ProjectA Presentation to the European Digital Library ConferenceSeptember 17, 2002 Sandy Payette and Thornton Staples
The Flexible Extensible Digital Object Repository Architecture (FEDORA) • Developed as a DARPA and NSF-funded research project at Cornell (1997-present) • Interpreted and re-implemented at University of Virginia (1999) • Virginia prototype supported a testbed of 10,000,000 digital objects with very good results (1999-2001) • Andrew W. Mellon Foundation granted Virginia and Cornell $1,000,000 to develop a full-featured production FEDORA system that that is web-based (2002+)
Managing the Collection • Provide a way to universally name all resources without respect to machine address • Track all files for resources, metadata and computer programs consistently • Enforce appropriate policies for use of Library resources • Manage resources in all media and content types • Support preservation activities appropriately
Delivering the Collection • Deliver tools with content for all media and content types • Allow every resource to be used in any number of contexts • Interoperate with other digital libraries • Move towards a library which aware user’s can configure for themselves
Supporting Digital Scholarship • Supporting the creation of digital scholarly projects • Collecting born-digital scholarly projects • For preservation • Taking over responsibility for primary delivery • Supporting information communities
Shortcomings of commercial digital library products • Narrow focus on specific media formats (e.g. image databases, document management) • Fail to effectively address interrelationships among digital entities • Fail to address interoperability; no open interfaces to facilitate sharing of services; no standard protocols for cross-system interoperability • Fail to provide facilities for managing programs and tools that are integral to delivering digital content. • Not extensible; do not enable easy integration of new tools and services
The Current Project • An efficient, scalable, freely distributable FEDORA repository system ASAP • A complete basic management interface with the initial release • Add important digital library functionality in later releases • Multiple testbed repositories to deploy and evaluate the software • Make all software open source
Deployment Group • The Digital Library group, Indiana University • The Humanities Computing group, New York University • The Digital Collections and Archives Department, Tufts University • The Humanities Computing group, Kings College London • The Refugee Studies Center, Oxford University • Audio/Video Project, Library of Congress • Library group, Los Alamos National Laboratory
The Fedora Architecture Overview of Basic Model
Digital Object Containerfor aggregating any digital content Content disseminations based on behavior definitions Extensibility of behavior mechanisms Repository Service layer for “contained” Digital Objects Object lifecycle management Access management FEDORA Basic Architectural Abstractions
FEDORA Digital Object Globally unique persistent id Persistent ID ( PID ) Public view: access methods for obtaining “disseminations” of digital object content Disseminators Internal view: metadata necessary to manage the object System Metadata Datastreams Protected view: content that makes up the “basis” of the object
FEDORA Digital Object Architecture Behavior Definition Object Data Object Persistent ID (PID) Persistent ID ( PID ) System Metadata Datastreams Disseminators Service Definition Metadata Behavior Mechanism Object System Metadata Persistent ID (PID) Datastreams System Metadata Datastreams Service Binding Metadata
Digital Object InteroperabilityCommon Behaviors for Variable Content Functional equivalency
Virginia Prototype Content Models and Fedora Demos
General Image Content Model Persistent ID ( PID ) Disseminators Behavior Behavior Disseminator Definition Mechanism web_ image1 web_image web_ image1 get_thumb HTTP GET get_ med imagedisplay.java get_high HTTP GET get_ veryhigh HTTP GET web_default_image web_default web_default_image Metadata get_as_page imagedisplay.java get_in_context HTTP GET (thumb) System Metadata admin Administrative metadata desc Descriptive metadata Datastreams basis1 pointer to thumbnail size image basis2 pointer to medium resolution image basis3 pointer to high resolution image basis4 pointer to highest resolution image (Mycenae image example)
MrSID Image Content Model Persistent ID ( PID ) Disseminators Behavior Behavior Disseminator Mechanism Definition web_image_ mrsid web_image web_image_ mrsid get_thumb get_ image.pl get_ med get_ image.pl get_high get_ image.pl get_ veryhigh get_ image.pl web_default_image web_default web_default_image get_as_page get_ image.pl get_in_context get_ image.pl Metadata System Metadata admin Administrative metadata desc Descriptive metadata Datastreams basis1 pointer to MrSID formatted image (Pavilion III image example)
Numerical Data Content Model (ICPSR survey example)
The New FEDORA Repository System Implementation
Web Services and XML • Web Service: A web application that publishes an open interface through which clients can send structured messages and receive structured responses. • Simple Object Access Protocol (SOAP) • SOAP is a messaging protocol that can run over different transport protocols (e.g., HTTP, SMTP) • Requests and responses sent as XML messages • Web Service Description Language (WSDL) • XML Schema used to formally define APIs (application programming interfaces) as a set of abstract operations and service bindings • Supports simple and complex data typing in requests and responses
New Fedora: Key Features • Repository system exposed as two related Web services • described using WSDL • both SOAP and HTTP bindings • Digital objects encoded and stored as XML using Metadata Encoding and Transmission Standard (METS) • Digital object behaviors implemented as linkages to distributed web services (also described using WSDL) • Digital objects support versioning of both content and services.
The New FEDORA Encoding Digital Objects in XML
Metadata Encoding and Transmission Standard (METS) • XML emerging standard for encoding descriptive, administrative, and structural metadata of digital library objects • Developed under auspices of the Digital Library Federation • METS standard maintained by the Network Development and MARC Standards Office of the Library of Congress http://www.loc.gov/standards/mets/
METS Schema • METS is written in the XML Schema Language • METS defines four sections for an object • Descriptive metadata • Administrative metadata • File group • Structure map • METS goals include: • Facilitate management of objects within a repository • Provide a standard format for exchange of objects between repositories • Provide standard format for transmission of objects to users for rendering (via tools or applications)
Digital Object Versioning • Versioning within Data Objects • Datastream versioning • Date/time stamped • New version every time datastream is modified • Disseminator versioning • Date/time stamped • New version if disseminator is modified to reference a different Behavior Mechanism (“better mousetrap”) • Versioning within Behavior Definition and Mechanism Objects • New versions of WSDL metadata recorded in these objects (with date/time stamps) • This deserves much more explanation that this slide can offer!
The New FEDORA Repository Services and Sub-systems
FEDORA Web Service API Definitions • “API-M” – interface for management sub-system • Operations necessary to create and maintain objects and their components • Interface directly with authoritative XML version of object • “API-A” – interface for access sub-system • Operations necessary for clients to perform disseminations on objects in the repository • No direct access to object internal structure or components • Will work against cached representation of object to optimize performance.
Fedora Management Sub-SystemImplements API-M • Object Management • Object Component Management • Object Validation • PID Generation • Interacts with Storage Subsystem • Access control via Security Subsystem
Fedora Access Sub-SystemImplements API-A • Object Reflection • Identify the types of Behavior Definitions to which an object subscribes (via the object’s Disseminators) • Reflect on a Behavior Definition to identify the kinds of disseminations that can be run on the object (i.e,. as method requests) • Dissemination • Fulfills requests for particular methods (i.e., of a Behavior Definition) to be run on an object • Mediates access to supporting services (i.e., Behavior Mechanisms) used to present or transform datastreams of the object • Returns a view of the object’s content to client
API-A: Object Reflection RequestsIdentify Types of Behavior Definitions • Each Disseminator is said to “subscribe” to a Behavior Definition • It does this by referencing the PID of aparticular Behavior Definition Object. • Each Behavior Definition Object containsmetadata that describes a set of related behaviors (or operations) • Via API-A, clients can send a service request to determine what Behavior Definitions an object subscribes to.
API-A: Object Reflection RequestGet Behavior Methods • Each Disseminator has a Behavior Definition Object associated with it. • Each Disseminator has a Behavior Mechanism Object associated with it that describes how to bind to a particular service that complies with the Disseminator’s Behavior Definition. • Via API-A, clients can send a service request to obtain the list of method definitions associated with a particular Disseminator of the digital object.
GetBehaviorDefinitions?PID=101 Web-default, Web-image, Admin API-A MrSID Image Object GetBehaviorMethods?PID=101&BID=Web-default Web-default Web-image get-as-page; get-in-context Admin Repository System Metadata Basis ( MrSID -encoded image file) API-A: Object Reflection Requests PID = 101
API-A: Dissemination Request • Clients can obtain content from a digital object with minimal knowledge about the object. • Behavior Definition identifiers and method definitions are the basis for making dissemination requests on digital objects • Client’s do not need to know particulars of how to attach to the service (Behavior Mechanism) that is operating on its behalf. • A dissemination request requires just three things: • Digital Object Identifier (PID) • Behavior Definition Identifier (BID) • Method name (and optional parameters) for a behavior
API-A GetDissemination?PID=101&BID=Web-default &method=get-as-page MrSID Image Object Web-default Web-image Admin Repository System Digital Object: 101 Metadata Basis Image of bird ( MrSID -encoded image file) API-A: Dissemination Request Bird Digital Library1 White Birds: Image 1 Image 2 Image 3
DisseminationsBenefits • Simple access: dissemination requests shield clients from the internal structure of digital objects • Stable interface: dissemination requests are like requests against an abstract interface in that they are not tied to object implementation details that may change over time (e.g., storage locations of datastreams) • Foster Interoperability: different digital objects can vary in both the format of content and how it is structured, yet we can access them in a consistent manner via disseminations.
Fedora Project Plan • Phase 1: (pre-release Oct 31, 2002; final Jan 2003) • Repository system with management and access subsystems exposed as web services • Storage subsystem with XML object store and replication to relational database cache • Object builder tools (GUI and batch) • Basic set of behavior services • Phase 2: Add more production support • Security and policy enforcement • Additional management tools • Optimize performance for accessing XML objects • Object versioning • Collection objects • Advanced disk management • Phase 3: Enhance end-user support • New kinds of disseminators, with supporting behavior services • Efficiency and scale optimization
FEDORA Web Site: www.fedora.info