200 likes | 433 Views
UC Digital Library Forum August 5, 2002 UCLA Digital Library http://digital.library.ucla.edu Presenter: Curtis Fornadley Senior Programmer/Analyst UCLA Library curtisf@library.ucla.edu Hardware and Software Architecture Project Architecture What is the Open Archives Initiative?
E N D
UC Digital Library Forum • August 5, 2002 • UCLA Digital Library • http://digital.library.ucla.edu • Presenter: Curtis Fornadley • Senior Programmer/Analyst UCLA Library • curtisf@library.ucla.edu • Hardware and Software Architecture • Project Architecture • What is the Open Archives Initiative? • The OAI Sheet Music Harvester UCLA Digital Library
UCLA Digital Library Hardware Architecture UCLA Digital Library
UCLA Digital Library Software Architecture - Java 2 Enterprise Edition (J2EE) (v1.3-1.4) - Oracle 8i (9i Fall 2002) - Oracle Intermedia Tool Kit - JRun Application Server (v3.1) - XML, XSLT - MS Access – for Metadata collection - Microsoft NT4 and Win 2000 UCLA Digital Library
Digital Library Projects Web based applications to search and present digital content and metadata. All projects share similar design patterns UCLA Digital Library
Combining Text (XML) and Format (XSLT) to Create HTML UCLA Digital Library
Archive of Popular American Music (APAM) • APAM contains ~ 450,000 pieces of Sheet Music • Metadata collected in UCLA Core. No pre-existing Metadata • Content is digitized in house (about 850 sheets so far) • Sheet music hosted as a PDF file. • All Covers and PDF’s are hosted from Oracle DB as Bfiles • Dynamic sizing of Cover images through Oracle InterMedia Tools. • http://digital.library.ucla.edu/sheetmusic • In production, last updated March 2002. • The basis for the OAI Sheet Music Harvester Project UCLA Digital Library
Open Archives Initiative Protocol for Metadata Harvesting (OAI Version 2.0) “The OAI protocol facilitates metadata harvesting” http://www.openarchives.org UCLA Digital Library
OAI Requests and Responses OAI Requests and Responses uses HTTP - “just like the web” OAI Requests Use either the HTTP GET or POST methods. OAI Responses Formatted as HTTP Responses. Every OAI Response is valid XML UCLA Digital Library
Important OAI “Verbs” • The meat of the OAI is six “verbs” issued in a request to harvest metadata. • 1) GetRecord - to retrieve an individual record • 2) Identify - to retrieve information about a repository • 3) ListIdentifiers - to retrieve the identifiers of records that can be harvested from a repository. • 4) ListMetadataFormats - to retrieve the metadata formats available from a repository. DC is • the minimum requirement. • 5) ListRecords - to harvest records from a repository. • 6) ListSets - to retrieve the set structure in a repository. UCLA Digital Library
Important OAI “Nouns” Repository - a server to which OAI protocol requests can be submitted. The repository outputs metadata in the form of a record. Record - an XML-encoded byte stream that is returned by a repository in response to an OAI request for metadata from an item in that repository. At a minimum, repositories must be able to return records with metadata expressed in unqualified Dublin Core. Set - A construct for grouping items in a repository for the purpose of selective harvesting of records. UCLA Digital Library
Sheet Music OAI Data Providers • UCLA – Currently online (Java) • Library of Congress - Currently online • John Hopkins University– any day now • Indiana University - September 2002 • Duke – within the next 12 months • Brown – within the next 12 months • Each participating institution is responsible for creating their own OAI-compliant sheet music repository. • Major hurdles to becoming a Data Provider: • -Programming • -Data Mapping UCLA Digital Library
High Level Design of OAI Sheet Music Service Provider UCLA Digital Library
OAI Sheet Music Project • Development Goals and Challenges • Leveraging UIUC Harvester code • Challenge of reverse engineering and extending code • Being flexible - combine relational and XML text indexing • Performance vs. Functionality: an on-going challenge • Testing of 0.1 Service Provider – August 2002 • Debut of the pilot - late Fall 2002 UCLA Digital Library
Hypothetical User Interface for Sheet Music Service Provider The biggest challenge is to create a Service Provider that extends the usable services offered to users. Conceptualize -> Design -> Implement UCLA Digital Library
Summary • John Ober’s Charge: “Discuss architecture and standards used in projects and the technical challenges yet to be faced.” • Challenges: • Metadata collection – Automated vs. Manual • Meeting infrastructure storage needs: Online – Nearline - Backup • Personal Challenges and Thoughts: • Many challenges are not technical • Developing a personal filter on information • Risk assessment: when is the right time to adopt a new technology • Surface knowledge vs. Deep understanding. Islands of knowledge • No stable resource body of knowledge to turn to for advise or help UCLA Digital Library