330 likes | 493 Views
Foundations of Excellence. DSpace vs Fedora: Or what I do on my summer vacation. Objectives. Background: Why we even considered a digital repository FOE – version 1 DSpace & Fedora: 50,000 foot view FOE – version 2 FOE – version 3 Where to from here?. Background. 75 th Anniversary.
E N D
Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation
Objectives • Background: Why we even considered a digital repository • FOE – version 1 • DSpace & Fedora: 50,000 foot view • FOE – version 2 • FOE – version 3 • Where to from here? TRLN: Staff Enrichment Series: 8 Nov, 2007
Background TRLN: Staff Enrichment Series: 8 Nov, 2007
75th Anniversary • Duke University School of Medicine established in 1930 • 2005 – year-long celebration • New published history • Articles, videos, speeches • Alumni weekend gala event • Josiah C. Trent Foundation Grant TRLN: Staff Enrichment Series: 8 Nov, 2007
Digitization Project • 500 images documenting the first 3 decades of the School of Medicine and Hospital • Image groups: • Buildings • Education • Events • Clinical • People • Technology TRLN: Staff Enrichment Series: 8 Nov, 2007
Digitization Project (cont.) • Selection – Whole staff • Digitization – Outsourced to University Photography • Description – Technical services and Reference coordinators • Subject terms – Technical services coordinator, Head, Cataloging services. • Controlled vocabulary – Notetab templates and libraries TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE1.0 XML, XSLT, and Postgresql
FOE1.0 • 600 images = 600 xml files = 2 xslt stylesheet • Xml = EAD2002 • XSLT = 1) convert xml to html; 2) convert xml to SQL statements • Postgresql database used only for search • Result http://archives.mc.duke.edu/projects/bld/bld00012.html TRLN: Staff Enrichment Series: 8 Nov, 2007
Issues • SQL search statements worked…not • No indexing by search engines • JDBC • I am not a programmer • Definite need for improvements TRLN: Staff Enrichment Series: 8 Nov, 2007
DSpace & Fedora:A Birds-eye View TRLN: Staff Enrichment Series: 8 Nov, 2007
Need for a Digital Repository • DSpace • First released in 2002. Developed by MIT Libraries and Hewlett-Packard (USA Today) • Current version (download) • Optimal performance in a *nix environment, but should operate in any environment • Written in Java • VERY active listservs • Manakin – TAMU created “front-end” which makes for easier UI localization TRLN: Staff Enrichment Series: 8 Nov, 2007
Need for a Digital Repository (cont.) • FEDORA (Flexible Extensible Digital Object and Repository Architecture) • Began as a DARPA and NSF-funded research project at Cornell in 1997 • 2001, UVA and Cornell: $1M Mellon grant • 1.0 released 2003 • Current version 2.2.1 (download) • Optimal performance in a *nix env, but will run on Windows based systems • Written in Java • Several front-end tools developed. (more in a moment) TRLN: Staff Enrichment Series: 8 Nov, 2007
Side by side testing • Testing environment: • Lenovo T60, 120 G hard drive, 2 G memory, Fedora 7, 2.6.23 kernel, java 1.5 TRLN: Staff Enrichment Series: 8 Nov, 2007
DSpace Java1.4 + Apache Ant 1.6.2 + Postgresql 7.3 + (or Oracle 9 +) Jakarta Tomcat 4.x/5.x (I used 6.x) Can also run on Jetty or Caucho Resin Fedora JDK 1.5 + Optional MySQL Postgresql Oracle 9 Jakarta Tomcat Ant 1.6.5 + if building from source code Requirements TRLN: Staff Enrichment Series: 8 Nov, 2007
DSpace 16 mb 1:43 over a T1 line 1:13 on a T line Fedora 72 mb 7:49 over a T1 line 1:53 over a T line File Size & Download times TRLN: Staff Enrichment Series: 8 Nov, 2007
DSpace Postgresql installation and set up: 8 minutes Ant build and configuration: 8 minutes DSpace/Tomcat configuration and deployment: 8 minutes Total time to live: 24 minutes Fedora Postgresql installation and set up: 8 minutes Fedora install: 5 minutes Total time to live: 13 minutes Installation time TRLN: Staff Enrichment Series: 8 Nov, 2007
DSpace Front Page Fedora Front Page Initial Live View TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE2.0 Choosing our Digital Repository
DSpace Off-the-shelf view Workflow process Individual submitters, one project admin Item submission form (link here) Bulk load script (dc, item, mapfile) Searchbot harvestable OAI harvestable Fedora Off-the-shelf view One submitter Item submission not intuitive (link) Bulk load script (foxml) Content Models (will return) Dissemenators Behavior Definitions Would require extensive programming Deciding Factors TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE2.0 = DSpaceCup is Half Full • March 2006 • Foundations new home • Data submission form • Item View bld00012 • Item Update • Access Restrictions • Handle server TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE2.0 = DSpaceCup is Half Empty • Object is entered as one item • DSpace is self-contained • No real way to show complex relationships • All or nothing metadata • Access Restrictions • Handle server • Searchbot indexing: • DSpace@DukeMed: Item 2193/77Title:, A. Jack Tannenbaum. Issue Date:, 10-Nov-2005 ... Abstract:, A. Jack Tannenbaum received his medical degree from Duke University in 1935. ... TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE3.0 “Our goal is to never be satisfied”
Content Models Reusing datastreams (next 2 slides borrowed from EDUCASE 2004 presentation by Grizzle, Wayland, and Wilper)
Atomistic Model TRLN: Staff Enrichment Series: 8 Nov, 2007
Compound Model TRLN: Staff Enrichment Series: 8 Nov, 2007
An old favorite blanket • 2005-2007 Fedora minimally utilized • Primarily used for archiving Library Administrative documents (Council and Management Team minutes, and Policies and procedures) • Use of XACML policies to restrict access (156\.16\.\d{1,3}\.\d{1,3} lock down) • Began looking at front-end GUIs TRLN: Staff Enrichment Series: 8 Nov, 2007
Front End tools • Fez – A web front-end management system for Fedora that is developed in PHP. Fez functionality includes: Web-based browsing and searching; Semi-advanced searching; Complex security; Basic image handling; Dublin Core. http://espace.library.uq.edu.au/documentation/ • Elated - ELATED is a lightweight, general-purpose application for managing digital files. ELATED is built on top of the Fedora Repository system, and can be used as a digital assets management system, an institutional repository, or to meet other collection archiving, publishing and searching needs. Dublin Core metadata entry and search; Custom metadata by collection; Automatic previews for images; Collections with simple editorial workflow; Indexing and searching of content; User feedback, enabled by collection; Select and import existing Fedora objects http://elated.sourceforge.net/ • Both require extensive programming for localization TRLN: Staff Enrichment Series: 8 Nov, 2007
External Forces at play • Fall 2006 we began a project to digitize 10,000+ cytopathology slides. • Images converted to JPEG2000 to increase user experience (example) • Archives purchased Aware JPEG2000 Image Server • History of Medicine image database, Historical Images in Medicine (HIM) needed new platform TRLN: Staff Enrichment Series: 8 Nov, 2007
Call out of the blue • VTLS – Vital • Open Repositories TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE3.0 = Fedora/VitalCup is Half Full • June 2007 • Foundations new home (link) • Data submission (3 ways to enter items) • Item View bld00012 • Object is entered as many datastreams (fedora view) • Vita/Fedora/Aware…interoperability • Complex relationships • Multiple metadata streams • Handle server • Searchbot indexing: • A. Jack Tannenbaum. | MeDSpaceDescription: A. Jack Tannenbaum received his medical degree from Duke University in 1935. ... per00165, A. Jack Tannenbaum. 302.3 kB, JPEG 2000 Image ... TRLN: Staff Enrichment Series: 8 Nov, 2007
FOE3.0 = Fedora/VitalCup is Half Empty • Fedora is open source, Vital is not • Customization possible with programming knowledge • No way at this time to implement xacml policies (work arounds exist) • Vital upgrades require full software installation • Local customization can cause breaks in certain functions TRLN: Staff Enrichment Series: 8 Nov, 2007
Selected Links DSpace – http://dspace.org Manakin - http://di.tamu.edu/projects/xmlui/install Fedora – http://www.fedora-commons.org/ Elated - http://elated.sourceforge.net/ Fez - http://espace.library.uq.edu.au/documentation/ Vital – http://vtls.com DSpace@DukeMed – http://dspace.mclibrary.duke.edu MeDSpace – http://medspace.mc.duke.edu/vital/access/manager/Index TRLN: Staff Enrichment Series: 8 Nov, 2007