1 / 48

New approaches to the catalog

New approaches to the catalog. T. Hickey http://errol.oclc.org/laf/n82-54463.html Svensk Biblioteksförening 2005 October 28. OCLC. Founded 1967 Nonprofit membership organization > 53,000 libraries 96 countries ~1,000 employees Cataloging Interlibrary Loan Preservation

Download Presentation

New approaches to the catalog

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New approaches to the catalog T. Hickey http://errol.oclc.org/laf/n82-54463.html Svensk Biblioteksförening 2005 October 28

  2. OCLC • Founded 1967 • Nonprofit membership organization • > 53,000 libraries • 96 countries • ~1,000 employees • Cataloging • Interlibrary Loan • Preservation • Dewey Decimal Classification • netLibrary • FirstSearch

  3. OCLC Research • Research for both • OCLC services • Membership • Metadata management • Knowledge organization • Content management • Interoperability • Systems & interaction design • ~30 employees

  4. What do users want? • The right information • with minimum effort

  5. How to give them what they want • Catch them where they are • Increase our data • Improve our data • Make the data work harder • Interconnect with other systems • Do all this efficiently

  6. What has changed • Computers and telecommunications • User expectations • Digital materials • Remoteness of our users • Huge amounts of bandwidth, storage

  7. The competition • Online booksellers • Reviews • Tables of contents • Excerpts • Inside-the-book searching • Web search engines • Speed • Full-text searching • Global coverage (of web resources) • Good enough • Ourselves • Electronic journals

  8. Live search Registries, PURLs Dewey browser Harvesting, electronic theses VIAF, LAF SRU/W, OpenURLs, OAI FRBR, xISBN Beowulf cluster Map-reduce Text searching Batch loading Open WorldCat WorldCat Wiki Publisher Names MXG Current projects (my group)

  9. Other Research Projects • FictionFinder, Curiouser • Schema Transformation • Terminology Services • Digital Preservation • Collection Analysis • Dublin Core • FAST • User Studies • Data mining • Also: http://www.oclc.org/research/researchworks/

  10. Catch them where they are • Google, Yahoo, etc. • Open WorldCat • Open URL • OAI-PMH • Creation too • WCat Wiki • Tags?

  11. OpenWorldCat

  12. Editions

  13. OpenURL • OpenURL registry • Supports version 1.0 • Also registry of OpenURL servers • Used for WikiD

  14. WorldCat ‘Wiki’ • Opening up WorldCat to user annotations • Reviews • Notes • Tables of contents • Cover art? • Book lists? • Based on WikiD software • Full Wiki • Many features off for WorldCat • Uses OpenURL 1.0 protocol internally • Allows collections of pages of arbitrary XML schemas • Tools for the creation of simple collections • Doesn’t look like a Wiki

  15. Reviews

  16. Tags? • Folksonomies? • User-generated key words • We’ve been here before • Is it different? • Is there another direction?

  17. Opening Dewey

  18. More data • Harvesting • OAI-PMH • ETDs • Batch load • 60 million records • 3 million new manifestations • Other • Cover art • Reviews • WC

  19. Better data and organization • VIAF • FRBR • Authority files in general • LAF • Publisher names • Genre • FAST • Registries • PURLs • Generalized solution? Get them nearer to creation

  20. FRBR • Work-set algorithm • Keys based on author/title • Authority files • Auxiliary authority files • xISBN • Used for • xISBN • Open WorldCat • FirstSearch (coming) • Collection analysis (coming) • Research

  21. Authority Files • LAF • http://errol.oclc.org/laf/n82-54463.html • Publisher names • Not normally controlled • Looking for variations with ISBN prefixes • Also worked with dissertations

  22. VIAF • Merge national-level files • Library of Congress (NACO) and Die Deutsche Bibliothek • Bibliographic records analyzed • 15% would be erroneous based just on names • Basic matching now completed • 435,000 matching names • < 1% mismatched • Working on • Public interface • OAI harvesting • Persistent identifiers

  23. Maj

  24. Registries • Show relationships between metadata • Often associated with an identifier • General solution? • Examples • Authority files • WorldCat • PURLs

  25. PURLs • Persistent URLs • Map one URL to another • http://purl.org/hickey/outgoing -> • http://outgoing.typepad.com/ • 500,000+ PURLs • 111 million resolutions • Port to Wiki’D platform? • http://www.oclc.org/research/projects/wikid/ • String of PURL servers? • Use OAI-PMH for synchronization • Spread responsibility • Generalized solution?

  26. More connectivity • Open URL • RSS feeds • OpenSearch, SRU/W • OAI-PMH

  27. OpenURL • Developed to address the ‘appropriate copy’ problem • Transitioning to OpenURL 1.0 • OpenURL resolver • Accepts requests specifying • Resource • Services • Generalized syntax • Specifying a resource • Services to be performed • Metadata elements specified in registry • http://purl.org/openurl/

  28. SRU • Simplified version of Z39.50 • Web based • SRW – SOAP • SRU – URL • Even simpler? • OpenSearch • No search syntax • Looking for common ground • MXG • Metasearch XML Gateway • Simplifies metasearcher’s lives

  29. OAI-PMH • Method of harvesting metadata • More generally, a way of synchronizing databases • No real restriction to metadata • Becomes a repository protocol • Identifiers • Timestamps • Layered implementation • OAI • SRU • Pears

  30. Efficient processing • Beowulf cluster • Map reduce • Text searching

  31. Beowulf Cluster • 24 nodes • 2 processors, 4 gigabytes of RAM, 120 gigabytes disk • Gigabit network • Use it for • FRBR processing • Text indexing • Text searching • ~ 30-fold speed up on many tasks • 1 year ⇒2 weeks • 1 week ⇒ 1 day • 1 day ⇒ 1 hour • 1 hour ⇒ 2 minutes • Extremely cheap processing

  32. Map reduce • Pioneered by Google • Petabytes of data on thousands of nodes • Adapted to our cluster • Tens of gigabytes of data on dozens of nodes • Simple functional programming paradigm • Allows batch processing across cluster

  33. Text Searching • Spread database across cluster • Two levels of aggregation • 3 servers/node • 24-way aggregation • Aggregators run across cluster • SRU used • HTTP based • SRW (SOAP) slowed it down • Open source software

  34. Better interfaces • More interactive • Live search • Dewey Browser • Better connected

  35. Post-coordination of Services • Systems that expose low level services • Higher level coordination of those services • Loosely coupled services • Examples from OCLC • Validation service • RSS feeds • SRU • OpenURL, OAI-PMH • xISBN • DDC Browser built this way • Very different interfaces have been built

  36. DDC Browser XML • <?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="/ddcbrowser/xsl/wcat.xsl" ?> • <cells> • <language>swe</language> • <cell ddc="330" count="23" /> • <cell ddc="331" count="28" /> • <cell ddc="332" count="5" /> • <cell ddc="333" count="7" /> • <cell ddc="334" count="2" /> • <cell ddc="335" count="1" /> • <cell ddc="336" count="3" /> • <cell ddc="337" count="2" /> • <cell ddc="338" count="26" /> • <cell ddc="339" count="5" /> • </cells>

  37. Do We Need It? • Just have Google harvest everything • Our experience with Google • Fielded searching • Reliable searching • Possibility of user-supplied metadata • Cost of good metadata • Cost of non-existent metadata

  38. Conclusions • Shift to remote users • Online availability – trend towards centralization • More flexibility in implementations • Patrons are better served • Less emphasis on physical collections

  39. Thank you T. Hickey http://errol.oclc.org/laf/n82-54463.html Swedish Library Association 2005 October 28

More Related