1 / 18

CS 502: Computing Methods for Digital Libraries

Lecture 13 Descriptive Metadata I: cataloguing, classification, authority files. CS 502: Computing Methods for Digital Libraries. Administration. Open laptop examination Read the Course Notices for instructions Remember, electronic communication is cheating!

zazu
Download Presentation

CS 502: Computing Methods for Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 13 Descriptive Metadata I: cataloguing, classification, authority files CS 502: Computing Methods for Digital Libraries

  2. Administration • Open laptop examination • Read the Course Notices for instructions • Remember, electronic communication is cheating! • Extension of wireless network • Uris Library and Olin Library (1st floor and basement) • Schedule changes • See Course Notices for next two lectures • Change of dates for future assignments

  3. Text Retrieval Conferences (TREC) • Quantitative research in digital libraries. • Compare performance of techniques, e.g., automatic thesauri, sophisticated term weighting, natural language techniques, relevance feedback, and advanced machine learning. • Corpus of several million textual documents -- 5 Gbytes. • Standard set of tasks, e.g., • Search the corpus for topics provided • Match a stream of documents against standard queries • Participants include large commercial companies, small information retrieval vendors, and university research groups.

  4. Descriptive metadata Some methods of information discovery search descriptive metadata about the objects. Metadata typically consists of a catalog or indexing record, or an abstract, one record for each object. • Catalog: metadata records that have a consistent structure, organized according to systematic rules. • Abstract: a free text record that summarizes a longer document. • Indexing record: less formal than a catalog record, but more structure than a simple abstract.

  5. Descriptive metadata • Usually stored separately from the objects that it • describes, but sometimes is embedded in the objects. • Usually the metadata is a set of text fields. • Textual metadata can be used to describe non-textual objects, e.g., software, images, music

  6. Library Cataloguing Anglo American Cataloguing Rules (AACR2) • rules for what goes into each field of a catalog record MARC format • an exchange format for catalog records "MARC Catalog" • catalog in MARC format, where content of each field follows AACR2

  7. Example: Monograph catalog record • Citation • Caroline R. Arms, editor, Campus strategies for libraries and electronic information. Bedford, MA: Digital Press, 1990.

  8. MARC fields • tagvalue • 001 89-16879 r93 • 050 Z675.U5C16 1990 • 082 027.7/0973 20 • 245 Campus strategies for libraries and electronic title statement • information/Caroline Arms, editor. • 260 {Bedford, Mass.} : Digital Press, c1990. publisher • 300 xi, 404 p. : ill. ; 24 cm. collation • 440 EDUCOM strategies series on information technology • series title • 504 Includes bibliographical references (p. {373}-381). • 020 ISBN 1-55558-036-X : $34.95

  9. MARC fields (continued) • 650 Academic libraries--United States--Automation. • subject heading • 650 Libraries and electronic publishing--United States. • 650 Library information networks--United States. • 650 Information technology--United States. • 700 Arms, Caroline R. (Caroline Ruth) • 040 DLC DLC DLC • 043 n-us--- • 955 CIP ver. br02 to SL 02-26-90 • 985 APIF/MIG

  10. MARC Encoding tag: 260 subfield a: {Bedford, Mass.} : subfield b: Digital Press, subfield c: c1990. MARC encoding: &2600#abc#{Bedford, Mass.} :#Digital Press,#c1990.%

  11. Name authority files • Caroline R. Arms or Caroline Ruth Arms? • Which William Phillips of Cardiff? • Mark Twain or Samuel Clemens? • Epithets: • of Cardiff • doctor • Dates: • 1832 - 1876 • flourished 1860 • circa 1832 - 1876

  12. Shared cataloguing • OCLC -- Large centralized transaction processing database system • When a library catalogs a book it deposits MARC record in OCLC • Other libraries can copy the record • saves duplication of cataloguing • build database of holdings • OCLC database has 42 million records

  13. Subject information Library of Congress Subject Headings Academic libraries--United States--Automation Hierarchical classification Library of Congress call number: Z675.U5C16 Dewey Decimal Classification: 027.7 Creation and maintenance of lists of subject headings and classifications is a never ending task.

  14. Online public access catalog (OPAC) • First stage • Library mounts its MARC records on a central computer • Provides a simple terminal interface and dedicated terminals • Boolean search -- fielded searching • [Most university libraries reached this stage about 1990] • Second stage • Library connects computer to a campus network and Internet • Converts card catalog records to MARC (retrospective conversion)

  15. Library information systems • When the catalog is online ... • Add other collections and services: • Secondary information (Inspec, Medline, Chemical Abstracts) • Reference works (dictionaries, encyclopedias) • Improve user interface • Add full text searching • Add web interface • Add connections to off-campus information sources: • Scientific journals • Databases (census, genome)

  16. Library management systems A library management system, sometimes called an integrated library system, integrates the internal processes of a library, e.g., acquisitions, cataloguing, binding, circulation, etc. It usually contains an online public access catalog, but does not provide integrated services to users. Library management systems are produced by small companies who lack the capital and technical expertise to develop modern digital libraries.

  17. Notes on MARC • A great achievement: • Developed in 1960s • Magnetic tape exchange format for printing catalog records • The dawn of computing: • mixed upper and lower case • variable length fields, • repeated fields • non-Roman scripts • 100(?) million records with standard content and format • Thousands of trained librarians (millions?)

  18. Notes on MARC • A great problem: • Not designed for computer algorithms • One record per item (poor links between records) • Tied to traditional materials and traditional practices • Not Unicode • 100 of million records at $100 -- $10 billion • A classic legacy system!

More Related