210 likes | 323 Views
South Central Unicorn Users Group Annual Conference, October 17, 2003 Austin, Texas. MARC Content Designation Use I mplications for indexing & interoperability.
E N D
South Central Unicorn Users Group Annual Conference, October 17, 2003 Austin, Texas MARC Content Designation Use Implications for indexing & interoperability William E. Moen<wemoen@unt.edu>School of Library and Information SciencesTexas Center for Digital KnowledgeUniversity of North TexasDenton, TX 72603
Overview • Context for the analysis -- interoperability • Findings from the analysis • Indexing and MARC • Discussion South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Context for the analysis • Interoperability across library online catalogs • Indexing of MARC records to support searching • Richness of MARC content designation available • Indexing guidelines prepared for the Z39.50 Interoperability Testbed (Z-Interop) • Implications for indexing guidelines and policies South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Interoperability Systems and organizations will interoperate! One should actively be engaged in the ongoing process of ensuring that the systems, procedures and culture of an organisation are managed in such a way as to maximise opportunities for exchange and re-use of information, whether internally or externally. Paul Miller, 2000 South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Factors affecting interoperability • Multiple and disparate systems • operating systems, information retrieval systems, etc. • Multiple protocols • Z39.50, HTTP, SOAP, etc. • Multiple data formats, syntax, metadata schemes • MARC 21, UNIMARC, XML, ISBD/AACR2-based, Dublin Core • Multiple vocabularies, ontologies, disciplines • LCSH, MESH, AAT • Multiple languages and character sets • Indexing, word normalization, and word extraction policies South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Information communities • Community agreements exist (e.g., standards, rules, etc.) • Interoperability factors reduced • Interoperability more easily achieved • Do we need additional agreements regarding indexing policies to improve interoperability? • Libraries as Focal Community • Relative homogeneity of data and systems • Standards-based MARC records • Content and structure prescribed by AACR • Commonly understood access points • Use of controlled vocabularies South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Interoperability testbed project Realizing the Vision of Networked Access to Library Resources: An Applied Research and Demonstration Project to Establish and Operate a Z39.50 Interoperability Testbed • A Institute of Museum and Library Services National Leadership Grant • Goal: Improve Z39.50 semantic interoperability among libraries for information access and resource sharing FOR MORE INFORMATION, VISIT THE PROJECT WEBSITE… http://www.unt.edu/zinterop/ South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Threats to Z39.50 interoperability • Differences in implementationof the standard • Differences in local information retrieval systems • Search functionality • Indexing policies • These threats can be addressed by • Z39.50 specifications and configuration (i.e., profiles) • Enhancing local information retrieval systems • Recommendations for local indexing decisions South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Components of the testbed • Test dataset • 400,000+ MARC 21 records from OCLC’s WorldCat • Z39.50 reference implementations • Z-client (Bookwhere), Z-server & information retrieval system (Sirsi Unicorn) • Test scenarios & searches • Searches with known result records from dataset • Benchmarks • Results of test searches using reference implementations South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
MARC • Record structure for encoding data for machine processing • Standard structure (ANSI/NISO Z39.2/ISO 2709) • Leader • Directory map • 3-digit tag to identify a field • 2 indicator values to provide additional processing information • 1 or more delimiters/codes to identify subfields • Content designation: Semantics • MARC 21 • 245 00 $a [title] $h [format] : $b [subtitle] • Rules • Anglo-American Cataloguing Rules and others South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
MARC 21 content designation South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Books: 91% Cartographic Materials: < 1% Electronic resources: < 1% Archival/Mixed Materials: <1% Sound recordings: 4% Visual Materials: 1% Serials: 3% Z-Interop test dataset • Approximately 1% sample of MARC records from OCLC’s WorldCat database • Weighted sampling based on number of libraries “holding” the object represented by the record • 419,657 total MARC records • 89% of records “full level” cataloging • Formats represented in test dataset South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
MARC record LDR01019cam 2200265 4500^001 ocm00000003^003 OCoLC^005 20010925133908.0^008 690414s1963 nyu b 000 0 eng ^010 $a63064323 ^040 $aDLC $cDLC ^050 04 $aHV700.5 $b.N37 ^082 0 $a362.7/3 ^110 2 $aNational Study Service. ^245 10 $aIllegitimacy and adoption in Maine : $breport of a study made for the Maine Committee on Children and Youth. ^260 $a[New York], $c1963. ^300 $a24 p. ; $c28 cm. ^500 $aCover title. ^504 $aBibliographical footnotes. ^650 0 $aIllegitimacy $zMaine. ^650 0 $aAdoption $zMaine. ^710 1 $aMaine. $bCommittee on Children and Youth. ^ South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Decomposing MARC Records 400,000 MARC21 records = 33 million decomposed records South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Content designation in dataset South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Summary frequency results Total number of fields/subfields occurring in dataset = 13,849,499 Only 4% of all fields/subfields account for 80% of all occurrences or 96% of all fields/subfields account for 20% of all occurrences South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Characteristics of top 36 • Most frequently occurring: 650 $a [Subject data] • 2nd most frequently occurring: 040 $d [Cataloging source] • 3rd & 4th most frequently occurring: 260 $a & $b [Publication information] • 5th most frequently occurring: 245 $a [Title] • Contain data useful to end users: 28 • Contain control numbers, etc.: 5 • Contain data useful to catalogers: 3 South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Indexing & MARC • Indexing Guidelines to Support Z39.50 Profile Searches • Identified all MARC 21 fields/subfields that may contain author, title, or subject data • Author-related fields/subfields : 119 • AuthorTitle-related fields/subfields: 21 • Title-related fields/subfields: 253 • Subject-related fields/subfields: 144 • 537 fields/subfields contain author, title, subject data • Usefulness of indexing all possible fields? South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Occurrences in test dataset • 381 occur one or more times in Z-Interop dataset • Author, title, or subject fields/subfields inZ-Interop dataset • Author-related fields/subfields : 86 • AuthorTitle-related fields/subfields: 16 • Title-related fields/subfields: 178 • Subject-related fields/subfields: 101 • 19 of the 381 (5%) account for 80% of all occurrences • 9 of 19 are subject-related • 5 of 19 are author-related • 5 of 19 are title-related • The 19 fields/subfields South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
Implications for indexing • What difference does indexing decisions make? • Preliminary testing using the 19 fields/subfields: • 95% - 100% of correct records retrieved! • How much time would be saved in setting up indexing policies? • Is there a systematic method to identify the “best” fields/subfields to index? • Per format of materials? • Per user (librarians and end users) needs? • Good enough search results? South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003
References • Z39.50 Interoperability Testbed • http://www.unt.edu/zinterop/ • Indexing Guidelines to Support Z39.50 Profile Searches • http://www.unt.edu/zinterop/Documents/IndexingGuidelines1Feb2002.pdf South Central Unicorn Users Group Annual Conference -- Austin, Texas -- October 17, 2003