500 likes | 615 Views
European FEDORA User Meeting Copenhagen, 28 September 2005. Introducing “Pergamos”. A FEDORA-based Digital Library System utilizing Digital Object Prototypes. Kostas Saidis saiko@di.uoa.gr. Libraries Computer Center Department of Informatics & Telecommunications University of Athens.
E N D
European FEDORA User MeetingCopenhagen, 28 September 2005 Introducing “Pergamos” A FEDORA-based Digital Library System utilizing Digital Object Prototypes Kostas Saidissaiko@di.uoa.gr Libraries Computer Center Department of Informatics & Telecommunications University of Athens
Outline • Motivation – The University of Athens (UoA) DL • Digital Objects (DOs) • DO Storage (FEDORA) • DO Manipulation (DL Application Logic) • Digital Object Prototypes • Automatic DO Type Conformance • Scope of Prototypes & Collection Management • Implementation Details • A Preview of Pergamos • Discussion
The UoA DL Project • Over 1 million objects originating from 8 disparate collections • Folklore notebooks, Ancient papyri, UoA Historical Archive, Byzantine music manuscripts, Theatrical photos & brochures, Informatics research papers and dissertations, Medical images, Press articles • Heterogeneous material, in terms of content type, metadata, structure, user requirements • Mostly digitized material, requiring detailed cataloging
UoA DL Project Metadata • Build a Web-based DL System to handle all material • Centralized DL approach due to • Existing hardware infrastructure • Funding restrictions • Administration simplicity • FEDORA is our DO Repository
UoA DL Project Metadata Contd. • Small Team • 2.5 developers, 1 librarian, 1 manager • Requirements, Specifications, Development, Digitization & Cataloging Management … • … while everyday tasks keep running! • Cataloging Personnel • Scholars & Experts in each collection’s domain (not librarians) • Strict Schedule • First Collection deadline: early 2006 • Project deadline: end of 2006
Motivation • Simplify & speed up the cataloging process • Provide effective Web-based cataloging interfaces • Automate content ingestion • Decrease development time • Avoid custom coding for each content variation • Elaborate on reusable and configurable DL modules • Provide the means to treat content variations in a unified manner
Digital Objects • A Digital Object is a human generated artifact consisting of the digital content and related information
FEDORA • FEDORA Digital Object Model • Content Models, Datastreams, Behavior Definitions, Mechanisms & Disseminators • FEDORA is a DO Repository • Focus on how each DO part is encoded & stored • Handles effectively issues related to storage, preservation & versioning, searching & indexing, interoperability
DL Application Logic • Cataloging, Workflows, Collection Building & Management, User Interfaces, etc • DL Modules manipulate DOs in a higher level of abstraction • Focus on the overall behavior of the DO (what are the DO parts and how do they behave) • DOs reflect the underlying “real world” objects – they behave according to their nature, their essence, their type
DO Typing information Do we effectively capture, express and utilize the nature (type) of DOs?
An example – Theatrical Collection • Albums containing photos of National Theater Performances • What is a Photo DO? • A digital image • stored in various formats (e.g high quality, www quality, thumbnail) • accompanied by the metadata required for describing the picture • What is an Album DO? • A container of Photo DOs accompanied by theatrical play metadata
A 2nd example – Historical Archive • University’s Senate Session Proceedings > Folders > Sessions > Items • What is a Item DO? • A digital image (capturing 1 or 2 pages) • stored in various formats (e.g high quality, www quality, thumbnail) • What is a Session DO? • A container of Item DOs + metadata • What is a Folder DO? • A container of Session DOs + metadata
DO Typing Information • FEDORA Content Models express DO Typing information • Content Models are metadata attributes (e.g. “photo”, “album”) that we use as a guide • Humans interpret Content Models, not the DL System • Manual resolution of DO Typing issues
Problems • Catalogers carry out manual XML editing in a low level of abstraction with too technical, complex & over detailed semantics • Developers generate ad-hoc, custom & not reusable implementations of DO types’ variations of behavior • DL modules exhibit limited evolution and configuration capabilities
DO Typing Information The DL System should resolve DO Typing issues automatically (in a manner transparent to the DL Application Logic)
Automatic DO Type Conformance • The designer specifies the various DO types… • … and the DL System makes DOs conform to these type specifications automatically • How?
The OO Viewpoint • In the OO model an object is itself aware of its “nature” and behaves accordingly • Objects are conceived as instances of a type, automatically conforming to the type’s definitions & specifications • OO types are separate entities (named either classes or prototypes)
Digital Object Prototypes • A DO Prototype is a DO Type Specification, a separate entity that defines the DO’s: • Constitutional parts – metadata sets, files, structure, etc • Private behaviors – DO internal operations such as serializations, validations, assignment of default values, content conversions, etc • Public behaviors (behavior schemes) – the DO external interface, consisting of high level operations such as Detail view, Browse View, Edit View, etc
DO Prototypes & Instances • The designer carries out the definition of DO Prototypes – the DL System handles the rest • DO Prototypes represent the realization of the Content Model notion in a OO fashion: • The process of generating a DO from a Prototype is called instantiation • The resulted object is an instance of the prototype • A DO instance automatically conforms to the Prototype’s specifications • Stored DOs vs DO instances
Digital Object Dictionary • The runtime environment in which DO instances and Prototypes operate: • Instantiation of DOs based on the prototype specifications (private behaviors: load & parse XML, assign default values, etc) • Exposure of the public DO behaviors in a high level, uniform API (for use by DL Modules) • Serialization of the DO instance back to FEDORA (private behaviors: serialize data structures in XML, perform validations, etc)
A DL Module performs the following steps: Acquire the DO Instance do = dictionary.acquireObject(“type”) do = dictionary.acquireObject(“uoadl:1024”) Perform operations upon it do.getMDSet(“DC”).getField(“title”) dictionary.executeBehavior(do, “editView”) Store the DO in the repository dictionary.saveObject(do) Cleaner, simpler, more effective Expression of DL Application Logic
3-tier DL Architecture Separation of Concerns
3-tier DL Architecture Separation of Concerns Storage
3-tier DL Architecture DO Typing & Instantiation Separation of Concerns Storage
3-tier DL Architecture Composition of DO behaviors DO Typing & Instantiation Separation of Concerns Storage
Pergamos If it sounds like Greek…
Scope of Prototypes • Should we have global DO Types? • Collection-pertinent types: A DO Prototype is defined in the context of a Collection • Support fine grained definition of collection specific kinds of material • Hierarchical naming scheme for types • Theatrical Collection Photo: dl.theatre.photo • Medical Collection Photo: dl.medical.photo • Stored in the “contentModel” metadata attribute • Avoid type collisions
Collection Management • DL = Hierarchy of DO instances • Collections are also DOs • The DL itself is a DO, representing the “super-collection” (the collection of all the collections) • Easily add new collections & sub-collections • All content is modeled in a unified manner & can be characterized • Allow the DL designer to work out the details of each collection independently, yet in a uniform manner
Implementation details • DO Prototypes are • Specified in XML form • Stored in the “TEMPLATE” datastream of the appropriate Collection DO • Loaded, parsed & interpreted by the DO Dictionary in its bootstrap procedure • Transparent to FEDORA • DO Instances are supplied with the “CONTAINER” datastream, containing the pids of the DOs they “contain”
DO Prototypes in detail • MD Sets • Specification of each individual field (label, description, multi-value, mandatory, UI characteristics) • Serialization information (how to store it in FEDORA) • Field mappings (under development) • Files: Automatic conversions (tiff -> jpeg + thumb) • Batch Import: automatically create Dos from zip bundles • Structure: allowed children types • Browsers: browse field • Indices: e.g. subject catalog • Behavior schemes: atomic DO elements
Pergamos • Historical Archive (production) • Folklore Notebooks (testing) • Theatrical Collection, Medical Images & Byzantine music manuscripts (finalization of requirements & specifications) • Undergoing development … the remaining collections are coming next • Historical Archive will be published on early 2006… • … with a multi-lingual UI, hopefully!
Future Work • Fully implement the OO paradigm • OO Inheritance for DO Prototypes (e.g the Notebook type derives from the Book type) • OO Polymorphism for DO instances (e.g the DO “uoadl:1234” is both a Notebook & a Book) • Supply general purpose linking capabilities that exceed structural relations (FEDORA Metadata for Object-to-Object Relationships?) • Deliver on schedule…
Conclusions • If in doubt, use FEDORA • Flexible & Extensible (they mean it) • 1 year of Pergamos development, 2 months of testing & 3 months of production use (Historical Archive) with no serious problems • Though, Sandy & Carl, I’d be grateful for some minutes of your time!!! • DO Prototypes: a realization of Content Models in OO terms, implemented on top of FDOM to handle DO Typing issues automatically • Detailed report on Pergamos to appear…
Thank You • Questions? • Comments? • For details: • "On the Effective Manipulation of Digital Objects: A Prototype-based Instantiation Approach"Kostas Saidis, George Pyrounakis, Mara Nikolaidou, Proc. 9th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2005, Vienna, Austria, September 2005 • email: saiko@di.uoa.gr