370 likes | 583 Views
So you want to start a digital library?. A presentation by Tom, Hugh, and Noel. Digital Libraries in focus. UC Berkeley Digital Library Project The Perseus Project The Digital Scriptorium The William Blake Archive. Berkeley D-Lib Overview.
E N D
So you want to start a digital library? A presentation by Tom, Hugh, and Noel
Digital Libraries in focus • UC Berkeley Digital Library Project • The Perseus Project • The Digital Scriptorium • The William Blake Archive
Berkeley D-Lib Overview • Very much a test bed: emphasis on developing technologies for the digital library, not so much focus on building a coherent, fully-functional library (so far…) • technology-focused • Contents
Perseus Project • “The Perseus Project is an evolving digital library of resources for the study of the ancient world and beyond. Collaborators initially formed the project to construct a large, heterogeneous collection of materials, textual and visual, on the Archaic and Classical Greek world…Recent expansion into Latin texts and tools and Renaissance materials has served to add more coverage within Perseus and has prompted the project to explore new ways of presenting complex resources for electronic publication.” • (inter)connection-focused • Starting Points
The Digital Scriptorium • The Digital Scriptorium is basically the extension into cyberspace of Duke’s Rare Book, Manuscript, and Special Collections Library. • collection-focused • Projects
The William Blake Archive • “..the Blake Archive was conceived as an international public resource that would provide unified access to major works of visual and literary art that are highly disparate, widely dispersed, and more and more often severely restricted as a result of their value, rarity, and extreme fragility.” • (single) book-focused • Does one thing really well • The texts
Components of a Digital Library STORAGE MANAGEMENT DELIVERY formatting metadata/history search capabilities • useful/meaningful metadata can increase usability • must be standardized • multiple ways of accessing the data (entry-points) • need for multiple formats • digital library should maintain detailed records of object history archiving browsing collections • predefined structure to the data (cf. collections) • on-site, system-wide, both? • arbitrary groupings of data user interaction • ability to re-view the data accessibility • users of differing physical and mental capabilities must have access to the library
Components of a Digital Library SERVER STORAGE SEARCHING DELIVERY USING BROWSING MANAGEMENT CLIENT MANAGEMENT USING DELIVERY SEARCHING BROWSING
STORAGE MANAGEMENT DELIVERY UC Berkeley’s Digital Library Project formatting metadata/history search capabilities • addresses and implements multiple search techniques; results vary • represents a test-bed of info and archiving best-practices • metadata standards defined, some implemented archiving collections browsing • discrete, disconnected collections • addresses and implements multiple searching techniques user interaction • experimental tools in text, image, GIS, etc. (buggy) Informix Universal Server. Database backend. DBI. Perl module for web cgi access to databases. AMASS Storage software. From Emass/ADIC. "Transforms offline storage into direct access mass storage." Cheshire II Search Engine. In-house search engine project. accessibility • information-overkill • reliance on Java = not universally accessible
STORAGE MANAGEMENT DELIVERY The Perseus Project formatting metadata/history search capabilities • standardized to the Web • metadata embedded with texts, images • multiple access points (via both texts and objects) • only basic formats available archiving collections browsing • offers numerous predefined collections • further file formats retained user interaction accessibility • easily navigable site • not approved by Bobby UNAVAILABLE TO THE PUBLIC.
STORAGE MANAGEMENT DELIVERY Duke’s Digital Scriptorium formatting metadata/history search capabilities • multiple Web-centric formats available • metadata via SGML/HTML • basic metadata search capabilities (limited by SGML) archiving collections browsing • offers useful predefined collections (“canned searches”) • masters not retained, only JPEG format used • discrete, disconnected collections • includes history behind data user interaction DynaWeb. From Inso. A tool that allows searches through structured SGML documents and translates from SGML to HTML on-the-fly. SGML. Using the Encoded Archival Description DTD. Webinator. From Thunderstone. Used to index the various static HTML pages in the Scriptorium. Also used to index the Duke Papyrus Collection. accessibility • easily navigable site • Bobby-approved
STORAGE MANAGEMENT DELIVERY The William Blake Archive formatting metadata/history search capabilities • multiple, standard formats, most available from the site • metadata retained on every region of every image • text and image-based searches (both based on metadata) archiving collections browsing • Works-in-Progress area allows for collaborative CM • the limited scope limits passive collection-browsing • TIFF originals retained user interaction • INote software allows for individual image markup accessibility • easily navigable site • not approved by Bobby DynaWeb.SGML.Java Applets. (ImageSizer, INote)
Finding Things in the Digital Library Analog Library Catalog / keyword search Browsing Special collections (varied / unique finding aids) ~ ~ ~ ~ Digital Library Metadata-based searches Virtual collections (varied finding aids) Content-based (exploitive) searches
Finding Examples • Using metadata (Blake Images) • Browsing (Perseus Texts) • By collection (Digital Scriptorium) • Using content • Berkeley Cheshire II (Documents) • Berkeley Cheshire II Tilebars (Documents) • Other media types (images, video, audio) • Helping the user distinguish (or not) • Berkeley (what am I really searching against) • Perseus search tools (metadata-based with pointers to content-based options)
Texts in Berkeley D-Lib • Multivalent documents • “Multivalent documents (MVD) represent an open, extensible, network-centric document model.” • Enable high functionality for scanned page images. E.g., in a scanned page image “enlivened’” by MVD, you can select and paste text, highlight matching search terms, and perform a variety of other manipulations, such as sorting a table in a scanned image. • Support distributed annotations. With MVD, annotations of many sorts can be made by any user on any supported document type. • Generate alternative views of components of documents. For example, MVD lenses allow a different view of a region of a screen. A magnification lens will magnify a region; an “OCR lens’” will show what an OCR process produces for that region. • Alternative selection. Instead of just selecting text, you can chose to have the selection modified in particular ways.
Texts in Perseus • Homer’s Iliad 1.1-32 (in Greek) • Homer’s Iliad 1.1-32 (with links to Perseus’ morphology parsing tool) • Homer’s Iliad 1.1-32 (with links to lemmas in the online lexicon) • Homer’s Iliad 1.1-32 (in Beta Code) • Homer’s Iliad 1.1-32 (in English, with links to searchable terms)
The Digital Scriptorium • Metadata: • EAD, which has 145 tags. • EAD is designed to describe hierarchical collections. An EAD file contains components (<c></c>), which can contain other components nested within them (<c01><c02></c02></c01>).
An Example of EAD <c03 level="item"><did> <unitid id="SHE-156">156.</unitid> <unitdate normal="16650404">4 April 17 Chas. II [1665]</unitdate><note><p><list> <item>(1) <persname authfilenumber="957702">George Shepperd</persname> of the <geogname authfilenumber="NT0526">town and county of Newcastle upon Tine</geogname>, gent.</item> <item>(2) <persname authfilenumber="23549">Anne Carr (née Franks)</persname> of <geogname authfilenumber="SS0032">South Sheiles</geogname> in the county of Durham, widow.</item></list> Lease by (1) to (2) of his half part of the messuage in <geogname uthfilenumber="PO0016">Pockerley</geogname> in the county of Durham with its <subject authfilenumber="c56">collieries and coalmines</subject>, and a fulling mill.<lb> Term: 1 month from <date normal=16650331">31 March 1665</date>.<lb> Consideration: £10.<lb> Signed: (1 ). Seal: red wax, papered, on parchment tag.</p></note> <physdesc><extent>Parchment. 1m.</extent></physdesc> <unitloc loctype="container">114/5-1</unitloc></did> <c04 level="item"><did> <unitid id="SHE-156a">156. (a)</unitid> <unitdate normal="16650414">14 April 17 Chas. II [1665]</unitdate><note><p> Attached to 156:<lb> Minutes of consultation with <persname authfilenumber="68239"> cousin Nan</persname> about above agreement.<lb> Refers to a book of surveys called <title render="italic">The Book of Pockerley</title>created in <date normal="162203xx">March 1622</date>.<lb> See <ref target="SHE-2056">no. 2056 below</ref> for letter containing description of this meeting.</p></note> <physdesc><extent>Paper. 1f.</extent></physdesc> <unitloc loctype="container">114/5-2</unitloc></did></c04></c03>
Rendered into plain text: 156. 4 April 17 Chas. II [1665] (1) George Shepperd of the town and county of Newcastle upon Tine, gent. (2) Anne Carr (née Franks) of South Sheiles in the county of Durham, widow. Lease by (1) to (2) of his half part of the messuage in Pockerley in the county of Durham with its collieries and coalmines, and a fulling mill. Term: 1 month from 31 March 1665. Consideration: £10. Signed: (1 ). Seal: red wax, papered, on parchment tag. Parchment. 1m. [114/5-1] 156. (a) 14 April 17 Chas. II [1665] Attached to 156: Minutes of consultation with "cousin Nan" about above agreement. Refers to a book of surveys called The Book of Pockerley created in March 1622. See no. 2056 below for letter containing description of this meeting. Paper. 1f. [114/5-2]
Texts in the Blake Archive • Also, essentially, multivalent documents. • Though with much stricter bounds than the Berkeley MVD’s. • They, too, use SGML markup to describe their archive.
Texts in the Blake Archive <component type="figure" location="D"> <characteristic>shepherd</characteristic> <characteristic>male</characteristic> <characteristic>young</characteristic> <characteristic>short hair</characteristic> <characteristic>tights</characteristic> <characteristic>standing</characteristic> <characteristic>contrapposto</characteristic> <characteristic>looking</characteristic> <illusobjdesc> A young, short-haired male shepherd in tights stands in contrapposto, watching his grazing flock of sheep--perhaps looking at the sheep that lifts its head toward him. He holds a crook in his left hand; his purse is visible near his right knee. </illusobjdesc> </component>
Possibilities for texts • Full markup = very powerful finding/linking capabilities • Text Encoding Initiative (~400 tags!) • Perseus is an example of how fully marked-up texts can be used.
What Else? • Georeferences • Contextual finding/browsing • Intelligent full-text searching
Imagery ISSUES • storage • management • delivery (searching, browsing, interaction)
Imagery: Best-Practices Example The Blake Archive storage of multiple resolutions and TIFF originals TIFF v. JPEG
Imagery: Best-Practices Example The Blake Archive metadata applied to images regionally
Imagery: Best-Practices Example The Blake Archive As a result, searching is improved. It further allows for interactive programs like INote, a regional metadata assignment program, used by contributors (thusfar) to enhance this metadata store.
Imagery: Further Issues? The Perseus Project While Perseus does archive larger image versions, the images that are accessible on the Web are useful only as peripheral learning aides, not learning tools in themselves. Perseus has strong searching tools for text and has applied this paradigm to its imagery. This creates very powerful and useful metadata binding to the image object. But can we do more? Unacceptablefor research.
Imagery: Interesting delivery? Image searching via pattern recognition. Berkeley’s Blobworldhttp://elib.cs.berkeley.edu/photos/blobworld/
Geographic data in the digital library • Tools for using geodata • Perseus Atlas • Berkeley GIS viewer Tools for searching with geodata or relating it to other objects Bueller? Bueller?
Searching / relating geodata • Interactive map: select feature(s) by browsing or query, get access to “related” objects in the collection • Pick a non-geodata object, use GIS & full-text searches in background to “lookup” potentially related objects (geodata and/or not) • Plot features found in non-geodata source on an interactive map
Lessons learned • What kind of digital library (libraries) do we want? • Repository & access for multiple more-or-less discrete collections? • Cutting-edge test bed for cool DL technologies? • “Working library” to support a set of defined needs (research, teaching, outreach)? • A set of tools, resources & expertise to allow units and divisions to assemble one or more of the above? • Hybrid?
Lessons learned • What kind of digital library (libraries) do we want? • Clearly defined mission, capabilities, features and institutional home/support keys to successful implementation (Blake)
Lessons learned • What kind of digital library (libraries) do we want? • Clearly defined mission, capabilities, features and institutional home/support keys to successful implementation (Blake) • The storage/management/delivery model will underpin whatever choices we make • How does each candidate vendor “solution” map onto this model? • How much customization / interconnectedness / extensibility is possible?
Lessons learned • What kind of digital library (libraries) do we want? • Clearly defined mission, capabilities, features and institutional home/support keys to successful implementation (Blake) • The storage/management/delivery model will underpin whatever choices we make • Mission before selection? Compromise on features inevitable but fraught with risk