1 / 52

Introduction to Digital Libraries

Introduction to Digital Libraries. Digital Library Models. Information Overload …. Having so much information available that you either cannot assimilate it all or it feels too overwhelming to take any of it in. Information Overload. Overwhelmed by the amount of information

creda
Download Presentation

Introduction to Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Digital Libraries Digital Library Models

  2. Information Overload … • Having so much information available that you either cannot assimilate it all or it feels too overwhelming to take any of it in

  3. Information Overload • Overwhelmed by the amount of information • Don’t understand the available information • Desperate to know if certain information exists • Don’t know where to find information • Unable to access information

  4. Digital Library’s Conceptual Model

  5. People • Library Professionals • Library Users • IT Professionals • Vendors

  6. Types of Digital Libraries • Stand-alone Digital Library (SDL) • Federated Digital Library (FDL) • Harvested Digital Library (HDL)

  7. Stand-alone Digital Library (SDL) This is the regular classical library implemented in a fully computerized fashion. SDL is simply a library in which the holdings are digital (scanned or digitized). The SDL is self-contained - the material is localized and centralized.

  8. The ACM Digital Library

  9. IEEE Computer Society DL

  10. Federated Digital Library (FDL) This is a federation of several independent SDLs in the network, organized around a common theme, and coupled together on the network. A FDL composes several autonomous SDLs that form a networked library with a transparent user interface. The different SDLs are heterogeneous and are connected via communication networks.

  11. Networked Digital Library of Theses & Dissertation

  12. Bibliographic Navigation Tools for Digital Libraries • SCOPUS • ELIN • Knowledge Cite Library • Database Advisor • OCLS’ FirstSearch

  13. Harvested Digital Library (HDL) • This is a virtual library providing summarized access to related material scattered over the network. Examples of HDLs are the Internet Public Library (IPL) • A HDL holds only metadata with pointers to the holdings that are "one click away" in Cyberspace. • Developed by Library Professionals, or Computer Scientists

  14. Digital Library Components

  15. Components of Digital Library

  16. Major Components • Content • Services • Technology • Socio-political culture

  17. Real world objects, concepts, ideas Examples (these are all resources) • People (focus of biographical reference tools) • Organizations (focus of organization directories) • Events (focus of developing "event gazetteers") • Places (focus of gazetteers) • Dates • Mathematical theorems (focus of mathematical encyclopedias) • Concepts, ideas • Problems and proposed solutions • Computer programs (focus of software directories or libraries) The reference model should have a more complete list and indicate sources dealing with these

  18. Content and Collections • Data capture, representation, preservation • Metadata • Domain specific information objects • Intellectual property rights • New economic and business models for digital libraries

  19. Contents Images .BMP .TIF .GIF .PNG .WMF .PICT .PCD .EPS .EMF .CGM .TGA .JPG Animation .ANI .FLI .FLC Video .AVI .MOV .MPG .QT

  20. Contents Audio .WAV .MID .SND .AUD .mp3 Web Page .HTM .HTML .DHTML .HTMLS .XML Text .DOC .TXT .RTF .PDF Programs .COM .EXE

  21. Contents Metadata standards • Dublin Core; http://dublincore.org/ • MARC 21; http://Icweb.loc.gov/marc/ • Encoded Archival Description (EAD); http://Icweb.loc.gov/ead/

  22. Digital Libraries • Repositories • “any computer system whose primary function is to store digital material for use in a library” • Archives • repositories that make longevity promises

  23. Technology

  24. Digital libraries must • Store a wide variety of often complex information objects and display these objects on different platforms. This requires modeling information objects, their internal structure, and relationships among them. • Provide data that support discovery, interpretation, use, and management of information objects. This requires a good metadata model • Support annotation of information objects. Annotations turn out to be surprisingly diverse. An annotation my refer to only a part of an information object. This requires an elegant model that can deal with many cases.

  25. Key Terms • digital objects (DOs) • a unit of exchange for the DL with a particular data structure and characteristics • repository • the place where DOs live • handles • a unique, persistent name for a DO

  26. Repositories

  27. Digital library objects • objects = metadata + data

  28. Digital Library Library Users Digital Library Services Digital Library Service Providers Digital Objects out of Archives Archive 1 Archive 2 Archive N Publishers Digital Objects in Archives

  29. Decision to build a digital repository • Building the repository will cost a lot. • Maintaining it is ok, if you have somebody on staff who has minimum system administration skills and you can pay for external hosting and local backup. • Comparing the repository to new physical collection is not helpful.

  30. Repository purpose questions • What type of resources will it contain? • How big is it supposed to grow? • Who is going to use it and how? • How can resources be protected against modification? • How will access and IP right be managed? • What systems will it see to interact with? • What resources will be available to create and maintained it?

  31. Names and identifiers • names != addresses • in any DL architecture diagram, (almost) anything that can be drawn can be named

  32. identification planning • This is an important process of building archive. • Anything that is considered a resource has to be given an identifier. • Identifiers can be dumb or intelligent. • Identification may be hierarchical and it can then be delegated.

  33. dumb identifiers • Dumb identifiers contain no information about the item that they are identify. • For example a number can be used. • Advantages • easy to create • Problem • not easy to relate to resource

  34. Intelligent identifiers • They say something about the resource. • Usually, any hierarchical identification structure has some intelligence built into it. • But there is a temptation to change the handle when there is a change in the intelligent matter that the handle is built on.

  35. URLs • URLs are tightly coupled with the physical location of an object, and are thus more likely to be transient • Tricks to make URLs more durable: • plan ahead when constructing web site structure • use good DNS CNAMEs • symbolic links on filesystems • http server redirects

  36. URNs • But with all the tricks available, URLs are not suitable for archival use in DLs • how long will this URL http://techreports.larc.nasa.gov/ltrs/PDF/1997/tm/NASA-97-tm112871.pdf • be good? • how to handle mirroring, replication, etc.? • mnemonic: • URL = IP address (128.82.5.173) • URN = IP name (blearg.cs.odu.edu)

  37. Handles • Handles can be thought of as a Uniform Resource Name (URN) implementation • http://www.handle.net/ contains info about the handle system • persistence • location independence • multiple instances

  38. DL Metadata Issues • Who provides metadata? • author? “publisher”? professional cataloger? extracted from content? • Is metadata “integrated” with data? • related question: is metadata a first class object? • Formats! • which ones? • Extensible?

  39. Services

  40. Digital Library • Digital Library Services • User • Functionality & Interface • Searching • Browsing • Archive • Managed sets of objects

  41. Introduction • Digital Library Scene • Search Engines • Heterogeneous • Vertical Information Retrieval • Unique User Interface • Search engines are different • Protocols are different • Querying & Ranking • Incompatible across the sources

  42. Search • “A repository must be structured and organized that users can readily find and use diverse types of resources.” • Users don’t search local repositories. They come in through search engines or aggregators (which are also found through search engines). Optimizing repositories for local findability is plain wrong.

  43. searching • You usually have resources and their descriptions. • You need to extract the searchable from the descriptions to make them searchable in the database. • Example: find pictures shot between 2011-04 and 2011-05.

  44. browsing • Here the data has to be discrete. • Many times the same entity is referred to by different values, e.g. “Thomas Krichel” vs “Томас Крихель”, “The Magic Flute” vs “Die Zauberflöte”. • If you want to have browsing by author, composer, work etc, you to, most likely manually, bring variant from together.

  45. Portal “A portal <is> a single point of access to distributed systems that provides services to support user needs to search, browse, and contribute content, often linking to shared existing functionality at other sites.”

  46. Taming the Web: RSS • RSS is a standard XML format for delivering content that changes on a regular basis • Content is delivered in small chunks, generally a synopsis, preview, or headline

  47. Using RSS • Look for small, orange icons (RSS or XML) • How it works …

  48. RSS • Instead of always checking your favorite sites one at a time • Lets you know when your favorite websites have been updated through “feeds” • Through your e-mail • Through the web • Through “aggregators” like Google Reader

More Related