270 likes | 412 Views
WMS, RUcore and Fedora Mini-Conference. Wednesday Morning Greetings and Introduction – Grace Collaboration and Architecture Overview – Ron RUcore Data Model – Grace WMS Tutorial - Mary Beth, Kalaivani, Sharon Lunch (box lunch in conference room) Wednesday Afternoon
E N D
WMS, RUcore and FedoraMini-Conference • Wednesday Morning • Greetings and Introduction – Grace • Collaboration and Architecture Overview – Ron • RUcore Data Model – Grace • WMS Tutorial - Mary Beth, Kalaivani, Sharon • Lunch (box lunch in conference room) • Wednesday Afternoon • Hands-On Experience – Mary Beth, Kalaivani, Sharon • Feedback from WMS sessions • Collaboration Discussion – All
WMS, RUcore and FedoraMini-Conference • Thursday Morning • Brief Recap – Ron • WMS architecture - Yang • User Interface, Search engine and collections - Chad • Management services - Ron • Lunch (on your own) • Thursday Afternoon • Further collaboration discussion • Wrap-up and next steps
Data Registries File formats Content Models Software Development Requirements Sharing software Joint development Life cycle support Sharing Content Exchange, harvesting Federated Searching Fedora Experimentation Relationship services Directory ingest Use of xacml Very large files Event management Possible Areas for Collaboration
Fedora Enterprise Architecture Major Goals – 2007 thru 2009 • Paradigm Focus • Scholarly Communication Collaboration • Libraries and Museums Access and Publishing • Infinite Scalability • Size of and number of objects • Capacity and throughput (e.g. ingest 20TB a day) • Life cycle preservation • Trust Model • Transactions - Begin/Commit • Transactions across repositories • Enable graph based objects (compound objects)
Middleware App. Prog. Interface Repository Persistence and Layered Architecture Applications Data
API Layered Architecture - RUcore Applications and Portals (NJDH, RUcore, workflow, etc) Middleware Services (searching, alerting, integrity, etc) Fedora Core & Framework FOXML & Datastreams
User Input Metadata and Archival masters XML RUcore - How it Works RUCORE Portal NJ Digital Highway Custom Portals Dissertations User, Collection, & Preservation Services Workflow Management System Fedora Repository Service Faculty Submissions Digital Object Repository (Fedora) Digital Object Ingest 7
Simple and Compound Objects Compound Object - Graph Model Article Object (Simple) Persistent ID IsAnnotationOf article Metadata Behaviors (Disseminators) Data streams IsAnnotationOf SMAP1 – StrMap (TOC) A2 DJVU1- presentation PDF1 - presentation XML1 – OCR text A1 ARCH1- Archival master (tiffs of each page)
Collections In RUcore • A digital collection is simply a grouping of objects according to some criteria. • Types of digital collections in RUcore • Explicit – A digital collection whose object membership is specified explicitly within the descriptive metadata. • Dynamic – A digital collection of objects which are grouped according to user specified criteria.
Using Explicit and Dynamic Collections • Personal Collections • Department Collections • Including Faculty Personal collections (e.g. preprints, reports, etc) • ETDs for the Department • Centers and Grant Funded Research • New Jersey Digital Highway • Center for Remote Sensing and Spatial Analysis (CRRSA) – Access and preservation of GIS resources related to New Jersey
New Jersey Historical Society O1 P1 P2 B1 O2 N2 O1 N3 N1 M1 Roosevelt RUcore Collection Architecture Circles – collection objects Rectangles – content objects RUCORE Solid line – explicit membership Dashed line – dynamic membership NJDH (Grant Project) Rutgers University Libraries Rutgers University Centers/ Departments Eagleton Archive General Collections Special Collections 11
Princeton (1782.1) Penn State (1782.1) Rutgers University (1782.2) ETDs (Graduate School) Department D3 D2 D1 Collection Architecture - Lefty RUCORE N’Western (1782.1) RUL (1782.1) Center/Dept Collections RU ETDs Dept. ETDs FacColl One FacColl Two • http://hdl.rutgers.edu/1782.1/NorthwesternU.collection.165 • http://hdl.rutgers.edu/1782.1/PennStateUniv.collection.164 • http://hdl.rutgers.edu/1782.1/PrincetonUniv.collection.166 Solid line – explicit membership Dashed line – dynamic membership 12
Management Services(incl. Collection and Preservation) • Management • Super-user editing (handles, datastreams, metadata) • Purging an object • Export (foxml, mets) • Collections • Collection administration • Statistics • Preservation • Creation of archival master • Creation of persistent ID (handle) • Checksum verification
Management Services • Access to individual objects is provided by a special search portal using the same indexes as the public search but providing Fedora API management functionality: • Viewing, Exporting and/or purging objects • Editing metadata, adding/changing datastreams • Validating objects, checking audit trails, testing signatures • There is a special Fedora database search allowing access to all objects whether or not they are members of an active collection.
Collection Administration • Edit collection information • Add parents to a collection • Add dynamic search terms to a collection • Generate an XML structure map
Collections - Indexing and Ingest • Active Collections may be indexed individually or all together at any time, though this is typically done using a nightly cron job. • Ingest is done through the management API and is typically called by the WMS program, but may be called directly from the management interface as well.
Preservation - Alerting • All Fedora API management functions trigger alerting messages, are stored in the Fedora audit trails, and are registered in the collection statistics database. • Statistics are kept for all object downloads as well as editing activities and may be accessed at collection or repository levels.
Preservation – PIDs and Handles • Handles are normally created as part of the ingest process, but may be manually created, changed, or purged on a per object basis using the management interface. • Three global registries for RU • 1782.1 – Rutgers University Libraries • 1782.2 – Rutgers University • 1782.3 – NJ Digital Highway
Object Integrity – Verifying Checksums • Archival datastreams have SHA1 checksums, created during the WMS pipeline process, as well as filesize data stored in the technical metadata section of each objects. • SHA1 checksums are tested using the sha1sum checking algorithm in conjunction with a management function that polls the repository and extracts sha1sum character strings from the techMD of individual objects or groups of objects. It has a calendar feature that allows it to be run as a cron on a subset of objects for each day of the week with result reports emailed to appropriate data managers.
Certification as a Trusted Repository* • Ultimately, we want to become certified as a trusted repository. There are four major areas: A. Organization B. Repository Functions Repository actively monitors Archival Information Package Integrity. Repository staff have skills appropriate to their duties. C. Designated Community D. Technologies Repository has technologies to monitor security. Repository defines its Designated Community • * RLG/NARA draft “An Audit Checklist for the Certification of Trusted Digital Repositories”
Preservation Services Architecture Preservation Portal Preservation Services . . . Alerting Migration Monitoring Statistics Event Messaging Preservation Integrity Preservation Monitoring Fedora Repository Service Content Models Digital Object Repository Format Registry Fedora Service Framework 21
Content Models(Content Model Dissemination Architecture – CMDA) • The CM object specifies constraints on the digital object (DO) • MIME type and format • Min/max of number of datastreams • Whether multiple datastreams are ordered • The CM is used to determine runtime behavior • On ingest, Fedora validates DO based on CM constraints • Disseminators are not bound into the DO • Run time binding occurs through the CM object and the rels-ext datastream • The CM can point to a format registry
Book Object Content Model Bmech Object Persistent ID Persistent ID Persistent ID Metadata Metadata Metadata Rels-Ext (cmodel: book) Rels-Ext Rels-Ext hasBdef hasBmech hasCM Composite Model WSDL Data streams Bdef Object SMAP1 – StrMap (TOC) Persistent ID DJVU1- presentation <dsCompositeModel> <dsTypeModel ID=“PDF1” ordered=“false” min=“1” max=“1”> <form MIME=“application/pdf”</form> </dsTypeModel> <dsTypeModel ID=“ARCH1” ordered=“false” min=“1” max=“1”> <form MIME=“application/tar”</form> </dsTypeModel> . . </dsCompositeModel> PDF1 - presentation Metadata Format Registry XML1 – OCR text MethodMap ARCH1- Archival master (tiffs of each page) pdf tar tiff Content Models, Formats, and Disseminators 23
Events and Outcomes • An event is an: • . . . action that involves at least one object, agent, and/or rights entity (PREMIS). • . . . occurrence that is significant to the performance of a task • Event outcome – a situation or state that follows an event and is a result of the event.
Fedora Event Management • Generic Framework • Events can have messages which are associated with all types of services (preservation, collection, user, etc) • Messages represent events with actions and outcomes • Fedora will provide a middle-ware messaging solution based on open-source Java Messaging Service (JMS) • Fedora Working Group Focus • Preservation events are atomic (i.e. associated with a Fedora API) • The event message will be based on the PREMIS event entity • Initial types: ingest, delete, modify, fixityCheck
The Event Message • Event message structure • The message payload will be xml-based and use the PREMIS event entity semantic units • Global identifiers (URIs) will be used for event type and outcome • An example might look like the following: <event> <eventIdentifier> <eventIdentifierType>Rucore event</eventIdentifierType> <eventIdentifierValue>30169</eventIdentifierValue> </eventIdentifier> <eventType>info:premis/preservation/event/ingest<eventType> <eventDateTime>2006-07-16T19:20:30</eventDateTime> <eventDetail>(to be used for general information)</eventDetail> <eventOutcomeInformation> <eventOutcome>info:premis/preservation/outcome/success</eventOutcome> <eventOutcomeDetail>(more text)</eventOutcomeDetail> </eventOutcomeInformation> <linkingAgentIdentifier>rutgers-lib:200</linkingAgentIdentifier> <linkingAgentIdentifier>rutgers-lib:400</linkingAgentIdentifier> <linkingObjectIdentifier>rutgers-lib:4291</linkingObjectIdentifier> </event>
Preservation Service (reporting) Preservation Service (alerting) JMS (snd/rcv) JMS (snd/rcv) JMS (snd/rcv) XML Event Management - Ingest(Using the publisher/subscriber model) User Input JMS Topic Queue <eventType>ingest<> <eventType>delete<> <eventType> <eventType> Workflow Management System <eventType> Digital Object Repository (Fedora) Digital Object Ingest