1 / 33

Foundations of Excellence

Foundations of Excellence. DSpace vs Fedora: Or what I do on my summer vacation. Objectives. Background: Why we even considered a digital repository FOE – version 1 DSpace & Fedora: 50,000 foot view FOE – version 2 FOE – version 3 Where to from here?. Background. 75 th Anniversary.

lapis
Download Presentation

Foundations of Excellence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation

  2. Objectives • Background: Why we even considered a digital repository • FOE – version 1 • DSpace & Fedora: 50,000 foot view • FOE – version 2 • FOE – version 3 • Where to from here? TRLN: Staff Enrichment Series: 8 Nov, 2007

  3. Background TRLN: Staff Enrichment Series: 8 Nov, 2007

  4. 75th Anniversary • Duke University School of Medicine established in 1930 • 2005 – year-long celebration • New published history • Articles, videos, speeches • Alumni weekend gala event • Josiah C. Trent Foundation Grant TRLN: Staff Enrichment Series: 8 Nov, 2007

  5. Digitization Project • 500 images documenting the first 3 decades of the School of Medicine and Hospital • Image groups: • Buildings • Education • Events • Clinical • People • Technology TRLN: Staff Enrichment Series: 8 Nov, 2007

  6. Digitization Project (cont.) • Selection – Whole staff • Digitization – Outsourced to University Photography • Description – Technical services and Reference coordinators • Subject terms – Technical services coordinator, Head, Cataloging services. • Controlled vocabulary – Notetab templates and libraries TRLN: Staff Enrichment Series: 8 Nov, 2007

  7. FOE1.0 XML, XSLT, and Postgresql

  8. FOE1.0 • 600 images = 600 xml files = 2 xslt stylesheet • Xml = EAD2002 • XSLT = 1) convert xml to html; 2) convert xml to SQL statements • Postgresql database used only for search • Result http://archives.mc.duke.edu/projects/bld/bld00012.html TRLN: Staff Enrichment Series: 8 Nov, 2007

  9. Issues • SQL search statements worked…not • No indexing by search engines • JDBC • I am not a programmer • Definite need for improvements TRLN: Staff Enrichment Series: 8 Nov, 2007

  10. DSpace & Fedora:A Birds-eye View TRLN: Staff Enrichment Series: 8 Nov, 2007

  11. Need for a Digital Repository • DSpace • First released in 2002. Developed by MIT Libraries and Hewlett-Packard (USA Today) • Current version (download) • Optimal performance in a *nix environment, but should operate in any environment • Written in Java • VERY active listservs • Manakin – TAMU created “front-end” which makes for easier UI localization TRLN: Staff Enrichment Series: 8 Nov, 2007

  12. Need for a Digital Repository (cont.) • FEDORA (Flexible Extensible Digital Object and Repository Architecture) • Began as a DARPA and NSF-funded research project at Cornell in 1997 • 2001, UVA and Cornell: $1M Mellon grant • 1.0 released 2003 • Current version 2.2.1 (download) • Optimal performance in a *nix env, but will run on Windows based systems • Written in Java • Several front-end tools developed. (more in a moment) TRLN: Staff Enrichment Series: 8 Nov, 2007

  13. Side by side testing • Testing environment: • Lenovo T60, 120 G hard drive, 2 G memory, Fedora 7, 2.6.23 kernel, java 1.5 TRLN: Staff Enrichment Series: 8 Nov, 2007

  14. DSpace Java1.4 + Apache Ant 1.6.2 + Postgresql 7.3 + (or Oracle 9 +) Jakarta Tomcat 4.x/5.x (I used 6.x) Can also run on Jetty or Caucho Resin Fedora JDK 1.5 + Optional MySQL Postgresql Oracle 9 Jakarta Tomcat Ant 1.6.5 + if building from source code Requirements TRLN: Staff Enrichment Series: 8 Nov, 2007

  15. DSpace 16 mb 1:43 over a T1 line 1:13 on a T line Fedora 72 mb 7:49 over a T1 line 1:53 over a T line File Size & Download times TRLN: Staff Enrichment Series: 8 Nov, 2007

  16. DSpace Postgresql installation and set up: 8 minutes Ant build and configuration: 8 minutes DSpace/Tomcat configuration and deployment: 8 minutes Total time to live: 24 minutes Fedora Postgresql installation and set up: 8 minutes Fedora install: 5 minutes Total time to live: 13 minutes Installation time TRLN: Staff Enrichment Series: 8 Nov, 2007

  17. DSpace Front Page Fedora Front Page Initial Live View TRLN: Staff Enrichment Series: 8 Nov, 2007

  18. FOE2.0 Choosing our Digital Repository

  19. DSpace Off-the-shelf view Workflow process Individual submitters, one project admin Item submission form (link here) Bulk load script (dc, item, mapfile) Searchbot harvestable OAI harvestable Fedora Off-the-shelf view One submitter Item submission not intuitive (link) Bulk load script (foxml) Content Models (will return) Dissemenators Behavior Definitions Would require extensive programming Deciding Factors TRLN: Staff Enrichment Series: 8 Nov, 2007

  20. FOE2.0 = DSpaceCup is Half Full • March 2006 • Foundations new home • Data submission form • Item View bld00012 • Item Update • Access Restrictions • Handle server TRLN: Staff Enrichment Series: 8 Nov, 2007

  21. FOE2.0 = DSpaceCup is Half Empty • Object is entered as one item • DSpace is self-contained • No real way to show complex relationships • All or nothing metadata • Access Restrictions • Handle server • Searchbot indexing: • DSpace@DukeMed: Item 2193/77Title:, A. Jack Tannenbaum. Issue Date:, 10-Nov-2005 ... Abstract:, A. Jack Tannenbaum received his medical degree from Duke University in 1935. ... TRLN: Staff Enrichment Series: 8 Nov, 2007

  22. FOE3.0 “Our goal is to never be satisfied”

  23. Content Models Reusing datastreams (next 2 slides borrowed from EDUCASE 2004 presentation by Grizzle, Wayland, and Wilper)

  24. Atomistic Model TRLN: Staff Enrichment Series: 8 Nov, 2007

  25. Compound Model TRLN: Staff Enrichment Series: 8 Nov, 2007

  26. An old favorite blanket • 2005-2007 Fedora minimally utilized • Primarily used for archiving Library Administrative documents (Council and Management Team minutes, and Policies and procedures) • Use of XACML policies to restrict access (156\.16\.\d{1,3}\.\d{1,3} lock down) • Began looking at front-end GUIs TRLN: Staff Enrichment Series: 8 Nov, 2007

  27. Front End tools • Fez – A web front-end management system for Fedora that is developed in PHP.  Fez functionality includes: Web-based browsing and searching; Semi-advanced searching; Complex security; Basic image handling; Dublin Core. http://espace.library.uq.edu.au/documentation/ • Elated - ELATED is a lightweight, general-purpose application for managing digital files. ELATED is built on top of the Fedora Repository system, and can be used as a digital assets management system, an institutional repository, or to meet other collection archiving, publishing and searching needs.  Dublin Core metadata entry and search; Custom metadata by collection; Automatic previews for images; Collections with simple editorial workflow; Indexing and searching of content; User feedback, enabled by collection; Select and import existing Fedora objects http://elated.sourceforge.net/  • Both require extensive programming for localization TRLN: Staff Enrichment Series: 8 Nov, 2007

  28. External Forces at play • Fall 2006 we began a project to digitize 10,000+ cytopathology slides. • Images converted to JPEG2000 to increase user experience (example) • Archives purchased Aware JPEG2000 Image Server • History of Medicine image database, Historical Images in Medicine (HIM) needed new platform TRLN: Staff Enrichment Series: 8 Nov, 2007

  29. Call out of the blue • VTLS – Vital • Open Repositories TRLN: Staff Enrichment Series: 8 Nov, 2007

  30. FOE3.0 = Fedora/VitalCup is Half Full • June 2007 • Foundations new home (link) • Data submission (3 ways to enter items) • Item View bld00012 • Object is entered as many datastreams (fedora view) • Vita/Fedora/Aware…interoperability • Complex relationships • Multiple metadata streams • Handle server • Searchbot indexing: • A. Jack Tannenbaum. | MeDSpaceDescription: A. Jack Tannenbaum received his medical degree from Duke University in 1935. ... per00165, A. Jack Tannenbaum. 302.3 kB, JPEG 2000 Image ... TRLN: Staff Enrichment Series: 8 Nov, 2007

  31. FOE3.0 = Fedora/VitalCup is Half Empty • Fedora is open source, Vital is not • Customization possible with programming knowledge • No way at this time to implement xacml policies (work arounds exist) • Vital upgrades require full software installation • Local customization can cause breaks in certain functions TRLN: Staff Enrichment Series: 8 Nov, 2007

  32. Conclusions and obligatory links

  33. Selected Links DSpace – http://dspace.org Manakin - http://di.tamu.edu/projects/xmlui/install Fedora – http://www.fedora-commons.org/ Elated - http://elated.sourceforge.net/ Fez - http://espace.library.uq.edu.au/documentation/ Vital – http://vtls.com DSpace@DukeMed – http://dspace.mclibrary.duke.edu MeDSpace – http://medspace.mc.duke.edu/vital/access/manager/Index TRLN: Staff Enrichment Series: 8 Nov, 2007

More Related