New approaches to the catalog

New approaches to the catalog T. Hickey http://errol.oclc.org/laf/n82-54463.html Svensk Biblioteksförening 2005 October 28

OCLC • Founded 1967 • Nonprofit membership organization • > 53,000 libraries • 96 countries • ~1,000 employees • Cataloging • Interlibrary Loan • Preservation • Dewey Decimal Classification • netLibrary • FirstSearch

OCLC Research • Research for both • OCLC services • Membership • Metadata management • Knowledge organization • Content management • Interoperability • Systems & interaction design • ~30 employees

What do users want? • The right information • with minimum effort

How to give them what they want • Catch them where they are • Increase our data • Improve our data • Make the data work harder • Interconnect with other systems • Do all this efficiently

What has changed • Computers and telecommunications • User expectations • Digital materials • Remoteness of our users • Huge amounts of bandwidth, storage

The competition • Online booksellers • Reviews • Tables of contents • Excerpts • Inside-the-book searching • Web search engines • Speed • Full-text searching • Global coverage (of web resources) • Good enough • Ourselves • Electronic journals

Live search Registries, PURLs Dewey browser Harvesting, electronic theses VIAF, LAF SRU/W, OpenURLs, OAI FRBR, xISBN Beowulf cluster Map-reduce Text searching Batch loading Open WorldCat WorldCat Wiki Publisher Names MXG Current projects (my group)

Other Research Projects • FictionFinder, Curiouser • Schema Transformation • Terminology Services • Digital Preservation • Collection Analysis • Dublin Core • FAST • User Studies • Data mining • Also: http://www.oclc.org/research/researchworks/

Catch them where they are • Google, Yahoo, etc. • Open WorldCat • Open URL • OAI-PMH • Creation too • WCat Wiki • Tags?

OpenWorldCat

Editions

OpenURL • OpenURL registry • Supports version 1.0 • Also registry of OpenURL servers • Used for WikiD

WorldCat ‘Wiki’ • Opening up WorldCat to user annotations • Reviews • Notes • Tables of contents • Cover art? • Book lists? • Based on WikiD software • Full Wiki • Many features off for WorldCat • Uses OpenURL 1.0 protocol internally • Allows collections of pages of arbitrary XML schemas • Tools for the creation of simple collections • Doesn’t look like a Wiki

Reviews

Tags? • Folksonomies? • User-generated key words • We’ve been here before • Is it different? • Is there another direction?

Opening Dewey

More data • Harvesting • OAI-PMH • ETDs • Batch load • 60 million records • 3 million new manifestations • Other • Cover art • Reviews • WC

Better data and organization • VIAF • FRBR • Authority files in general • LAF • Publisher names • Genre • FAST • Registries • PURLs • Generalized solution? Get them nearer to creation

FRBR • Work-set algorithm • Keys based on author/title • Authority files • Auxiliary authority files • xISBN • Used for • xISBN • Open WorldCat • FirstSearch (coming) • Collection analysis (coming) • Research

Authority Files • LAF • http://errol.oclc.org/laf/n82-54463.html • Publisher names • Not normally controlled • Looking for variations with ISBN prefixes • Also worked with dissertations

VIAF • Merge national-level files • Library of Congress (NACO) and Die Deutsche Bibliothek • Bibliographic records analyzed • 15% would be erroneous based just on names • Basic matching now completed • 435,000 matching names • < 1% mismatched • Working on • Public interface • OAI harvesting • Persistent identifiers

Maj

Registries • Show relationships between metadata • Often associated with an identifier • General solution? • Examples • Authority files • WorldCat • PURLs

PURLs • Persistent URLs • Map one URL to another • http://purl.org/hickey/outgoing -> • http://outgoing.typepad.com/ • 500,000+ PURLs • 111 million resolutions • Port to Wiki’D platform? • http://www.oclc.org/research/projects/wikid/ • String of PURL servers? • Use OAI-PMH for synchronization • Spread responsibility • Generalized solution?

More connectivity • Open URL • RSS feeds • OpenSearch, SRU/W • OAI-PMH

OpenURL • Developed to address the ‘appropriate copy’ problem • Transitioning to OpenURL 1.0 • OpenURL resolver • Accepts requests specifying • Resource • Services • Generalized syntax • Specifying a resource • Services to be performed • Metadata elements specified in registry • http://purl.org/openurl/

SRU • Simplified version of Z39.50 • Web based • SRW – SOAP • SRU – URL • Even simpler? • OpenSearch • No search syntax • Looking for common ground • MXG • Metasearch XML Gateway • Simplifies metasearcher’s lives

OAI-PMH • Method of harvesting metadata • More generally, a way of synchronizing databases • No real restriction to metadata • Becomes a repository protocol • Identifiers • Timestamps • Layered implementation • OAI • SRU • Pears

Efficient processing • Beowulf cluster • Map reduce • Text searching

Beowulf Cluster • 24 nodes • 2 processors, 4 gigabytes of RAM, 120 gigabytes disk • Gigabit network • Use it for • FRBR processing • Text indexing • Text searching • ~ 30-fold speed up on many tasks • 1 year ⇒2 weeks • 1 week ⇒ 1 day • 1 day ⇒ 1 hour • 1 hour ⇒ 2 minutes • Extremely cheap processing

Map reduce • Pioneered by Google • Petabytes of data on thousands of nodes • Adapted to our cluster • Tens of gigabytes of data on dozens of nodes • Simple functional programming paradigm • Allows batch processing across cluster

Text Searching • Spread database across cluster • Two levels of aggregation • 3 servers/node • 24-way aggregation • Aggregators run across cluster • SRU used • HTTP based • SRW (SOAP) slowed it down • Open source software

Better interfaces • More interactive • Live search • Dewey Browser • Better connected

Post-coordination of Services • Systems that expose low level services • Higher level coordination of those services • Loosely coupled services • Examples from OCLC • Validation service • RSS feeds • SRU • OpenURL, OAI-PMH • xISBN • DDC Browser built this way • Very different interfaces have been built

DDC Browser XML • <?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="/ddcbrowser/xsl/wcat.xsl" ?> • <cells> • <language>swe</language> • <cell ddc="330" count="23" /> • <cell ddc="331" count="28" /> • <cell ddc="332" count="5" /> • <cell ddc="333" count="7" /> • <cell ddc="334" count="2" /> • <cell ddc="335" count="1" /> • <cell ddc="336" count="3" /> • <cell ddc="337" count="2" /> • <cell ddc="338" count="26" /> • <cell ddc="339" count="5" /> • </cells>

Do We Need It? • Just have Google harvest everything • Our experience with Google • Fielded searching • Reliable searching • Possibility of user-supplied metadata • Cost of good metadata • Cost of non-existent metadata

Conclusions • Shift to remote users • Online availability – trend towards centralization • More flexibility in implementations • Patrons are better served • Less emphasis on physical collections

Thank you T. Hickey http://errol.oclc.org/laf/n82-54463.html Swedish Library Association 2005 October 28

New approaches to the catalog

New approaches to the catalog

Presentation Transcript

New Incentive Approaches to Adherence

New approaches to sleep monitoring

New Approaches to Translation History

Open to New Approaches

New approaches to cleaning validation

New Approaches

New Approaches to LDL Reduction

New approaches to social investment

New Approaches to Musculoskeletal Wellness

New Approaches to Disability

New Approaches to Scientific Computing

The Catalog

New decade, New approaches to AECOPD

New Approaches to DFA learning

New Approaches to

Embracing New Approaches to Philanthropy

New Approaches to Assessment - Online

New Approaches to Labs

New approaches to the History of Hasidism

New Approaches to Data Assimilation

New approaches to the catalog

New Approaches to social care