2.74k likes | 6.35k Views
Introduction to Digital Libraries. Digital Library Models. Information Overload …. Having so much information available that you either cannot assimilate it all or it feels too overwhelming to take any of it in. Information Overload. Overwhelmed by the amount of information
E N D
Introduction to Digital Libraries Digital Library Models
Information Overload … • Having so much information available that you either cannot assimilate it all or it feels too overwhelming to take any of it in
Information Overload • Overwhelmed by the amount of information • Don’t understand the available information • Desperate to know if certain information exists • Don’t know where to find information • Unable to access information
People • Library Professionals • Library Users • IT Professionals • Vendors
Types of Digital Libraries • Stand-alone Digital Library (SDL) • Federated Digital Library (FDL) • Harvested Digital Library (HDL)
Stand-alone Digital Library (SDL) This is the regular classical library implemented in a fully computerized fashion. SDL is simply a library in which the holdings are digital (scanned or digitized). The SDL is self-contained - the material is localized and centralized.
Federated Digital Library (FDL) This is a federation of several independent SDLs in the network, organized around a common theme, and coupled together on the network. A FDL composes several autonomous SDLs that form a networked library with a transparent user interface. The different SDLs are heterogeneous and are connected via communication networks.
Bibliographic Navigation Tools for Digital Libraries • SCOPUS • ELIN • Knowledge Cite Library • Database Advisor • OCLS’ FirstSearch
Harvested Digital Library (HDL) • This is a virtual library providing summarized access to related material scattered over the network. Examples of HDLs are the Internet Public Library (IPL) • A HDL holds only metadata with pointers to the holdings that are "one click away" in Cyberspace. • Developed by Library Professionals, or Computer Scientists
Major Components • Content • Services • Technology • Socio-political culture
Real world objects, concepts, ideas Examples (these are all resources) • People (focus of biographical reference tools) • Organizations (focus of organization directories) • Events (focus of developing "event gazetteers") • Places (focus of gazetteers) • Dates • Mathematical theorems (focus of mathematical encyclopedias) • Concepts, ideas • Problems and proposed solutions • Computer programs (focus of software directories or libraries) The reference model should have a more complete list and indicate sources dealing with these
Content and Collections • Data capture, representation, preservation • Metadata • Domain specific information objects • Intellectual property rights • New economic and business models for digital libraries
Contents Images .BMP .TIF .GIF .PNG .WMF .PICT .PCD .EPS .EMF .CGM .TGA .JPG Animation .ANI .FLI .FLC Video .AVI .MOV .MPG .QT
Contents Audio .WAV .MID .SND .AUD .mp3 Web Page .HTM .HTML .DHTML .HTMLS .XML Text .DOC .TXT .RTF .PDF Programs .COM .EXE
Contents Metadata standards • Dublin Core; http://dublincore.org/ • MARC 21; http://Icweb.loc.gov/marc/ • Encoded Archival Description (EAD); http://Icweb.loc.gov/ead/
Digital Libraries • Repositories • “any computer system whose primary function is to store digital material for use in a library” • Archives • repositories that make longevity promises
Digital libraries must • Store a wide variety of often complex information objects and display these objects on different platforms. This requires modeling information objects, their internal structure, and relationships among them. • Provide data that support discovery, interpretation, use, and management of information objects. This requires a good metadata model • Support annotation of information objects. Annotations turn out to be surprisingly diverse. An annotation my refer to only a part of an information object. This requires an elegant model that can deal with many cases.
Key Terms • digital objects (DOs) • a unit of exchange for the DL with a particular data structure and characteristics • repository • the place where DOs live • handles • a unique, persistent name for a DO
Digital library objects • objects = metadata + data
Digital Library Library Users Digital Library Services Digital Library Service Providers Digital Objects out of Archives Archive 1 Archive 2 Archive N Publishers Digital Objects in Archives
Decision to build a digital repository • Building the repository will cost a lot. • Maintaining it is ok, if you have somebody on staff who has minimum system administration skills and you can pay for external hosting and local backup. • Comparing the repository to new physical collection is not helpful.
Repository purpose questions • What type of resources will it contain? • How big is it supposed to grow? • Who is going to use it and how? • How can resources be protected against modification? • How will access and IP right be managed? • What systems will it see to interact with? • What resources will be available to create and maintained it?
Names and identifiers • names != addresses • in any DL architecture diagram, (almost) anything that can be drawn can be named
identification planning • This is an important process of building archive. • Anything that is considered a resource has to be given an identifier. • Identifiers can be dumb or intelligent. • Identification may be hierarchical and it can then be delegated.
dumb identifiers • Dumb identifiers contain no information about the item that they are identify. • For example a number can be used. • Advantages • easy to create • Problem • not easy to relate to resource
Intelligent identifiers • They say something about the resource. • Usually, any hierarchical identification structure has some intelligence built into it. • But there is a temptation to change the handle when there is a change in the intelligent matter that the handle is built on.
URLs • URLs are tightly coupled with the physical location of an object, and are thus more likely to be transient • Tricks to make URLs more durable: • plan ahead when constructing web site structure • use good DNS CNAMEs • symbolic links on filesystems • http server redirects
URNs • But with all the tricks available, URLs are not suitable for archival use in DLs • how long will this URL http://techreports.larc.nasa.gov/ltrs/PDF/1997/tm/NASA-97-tm112871.pdf • be good? • how to handle mirroring, replication, etc.? • mnemonic: • URL = IP address (128.82.5.173) • URN = IP name (blearg.cs.odu.edu)
Handles • Handles can be thought of as a Uniform Resource Name (URN) implementation • http://www.handle.net/ contains info about the handle system • persistence • location independence • multiple instances
DL Metadata Issues • Who provides metadata? • author? “publisher”? professional cataloger? extracted from content? • Is metadata “integrated” with data? • related question: is metadata a first class object? • Formats! • which ones? • Extensible?
Digital Library • Digital Library Services • User • Functionality & Interface • Searching • Browsing • Archive • Managed sets of objects
Introduction • Digital Library Scene • Search Engines • Heterogeneous • Vertical Information Retrieval • Unique User Interface • Search engines are different • Protocols are different • Querying & Ranking • Incompatible across the sources
Search • “A repository must be structured and organized that users can readily find and use diverse types of resources.” • Users don’t search local repositories. They come in through search engines or aggregators (which are also found through search engines). Optimizing repositories for local findability is plain wrong.
searching • You usually have resources and their descriptions. • You need to extract the searchable from the descriptions to make them searchable in the database. • Example: find pictures shot between 2011-04 and 2011-05.
browsing • Here the data has to be discrete. • Many times the same entity is referred to by different values, e.g. “Thomas Krichel” vs “Томас Крихель”, “The Magic Flute” vs “Die Zauberflöte”. • If you want to have browsing by author, composer, work etc, you to, most likely manually, bring variant from together.
Portal “A portal <is> a single point of access to distributed systems that provides services to support user needs to search, browse, and contribute content, often linking to shared existing functionality at other sites.”
Taming the Web: RSS • RSS is a standard XML format for delivering content that changes on a regular basis • Content is delivered in small chunks, generally a synopsis, preview, or headline
Using RSS • Look for small, orange icons (RSS or XML) • How it works …
RSS • Instead of always checking your favorite sites one at a time • Lets you know when your favorite websites have been updated through “feeds” • Through your e-mail • Through the web • Through “aggregators” like Google Reader