500 likes | 614 Views
Netspeed 2002 Conference, October 25, 2002 Calgary, Alberta. Interoperability, Z39.50 Profiles & Testing. William E. Moen <wemoen@unt.edu> School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX 72603. Overview. Interoperability
E N D
Netspeed 2002 Conference, October 25, 2002 Calgary, Alberta Interoperability,Z39.50 Profiles &Testing William E. Moen<wemoen@unt.edu>School of Library and Information SciencesTexas Center for Digital KnowledgeUniversity of North TexasDenton, TX 72603
Overview • Interoperability • Profiles • The Bath Profile • The U.S. National Profile • Beyond profiles • Indexing and search functionality • Interoperability testing Netspeed 2002 -- Calgary, Alberta -- October 2002
Interoperability Systems and organizations will interoperate! One should actively be engaged in the ongoing process of ensuring that the systems, procedures and culture of an organisation are managed in such a way as to maximise opportunities for exchange and re-use of information, whether internally or externally. Paul Miller, 2000 Netspeed 2002 -- Calgary, Alberta -- October 2002
Defining interoperability System-oriented definition • The ability of two or more systemsor components to exchange information and use the exchanged information without special effort on either system User-oriented definition • User’s ability to successfully search and retrieve information in a meaningful way and have confidence in the results • The condition achieved when two or more technical systems can exchange information directly in a way that is satisfactory to usersof the systems (AAP) Netspeed 2002 -- Calgary, Alberta -- October 2002
Assessing interoperability • Binary • Interoperable • Not interoperable • Continuum • More or less interoperable • Acceptable levels of interoperability Netspeed 2002 -- Calgary, Alberta -- October 2002
Factors affecting interoperability • Multiple and disparate systems • operating systems, information retrieval systems, etc. • Multiple protocols • Z39.50, HTTP, SOAP, etc. • Multiple data formats, syntax, metadata schemes • MARC 21, UNIMARC, XML, / ISBD/AACR2-based, Dublin Core • Multiple vocabularies, ontologies, disciplines • LCSH, MESH, AAT • Multiple languages, Multiple character sets • Indexing, word normalization, and word extraction policies Netspeed 2002 -- Calgary, Alberta -- October 2002
Mapping the landscape • Networked information retrieval occurs within and across communities • Information communities • Focal community (e.g., libraries) • Extended community (e.g., cultural heritage community) • Extra community • Knowledge Domains • Intra domain • Extra domain • Costs to achieve interoperability vary Netspeed 2002 -- Calgary, Alberta -- October 2002
Extended Community (e.g., Cultural Heritage) Focal Community (e.g., Libraries) Focal Community (e.g., Archives Focal Community (e.g., Museum) Information communities Extra Community Focal Community (e.g., Geospatial ) Extended Community Focal Community (e.g., Geospatial) Focal Community (e.g., Natural HistoryMuseums) Netspeed 2002 -- Calgary, Alberta -- October 2002
Focal community • Community agreements exist (e.g., standards, rules, etc.) • Interoperability factors reduced • Interoperability more easily achieved • Libraries as Focal Community • Relative homogeneity of data and systems • Z39.50 widely implemented • Standards-based MARC records • Content and structure prescribed by AACR • Commonly understood access points • Use of controlled vocabularies Netspeed 2002 -- Calgary, Alberta -- October 2002
Threats to Z39.50 interoperability • Differences in implementationof the standard • Differences in local information retrieval systems • Search functionality • Indexing policies • These threats can be addressed by • Z39.50 specifications and configuration • Enhancing local information retrieval systems • Recommendations for local indexing decisions Netspeed 2002 -- Calgary, Alberta -- October 2002
Virtual Catalog Application Netspeed 2002 -- Calgary, Alberta -- October 2002
Z39.50 Model of Resource Discovery Netspeed 2002 -- Calgary, Alberta -- October 2002
Complete Z39.50 Specifications Z39.50 Profile Profiles Z39.50 specifications Profiles are a solution path forimproving interoperability • Represent community consensus on requirements • Identify Z39.50 specifications to support those requirements • Aid in purchasing decisions • Provide specifications for vendors Netspeed 2002 -- Calgary, Alberta -- October 2002
Profiles • Defines a subset of specifications from one or more standards • Goal of profiles is to improve interoperability • Profiles are useful for: • prescribing how Z39.50 should be used in a particular application environment • solving interoperability problems with existing Z39.50 implementations within a community or across two or more communities Netspeed 2002 -- Calgary, Alberta -- October 2002
The Bath Profile The Bath Profile: An International Z39.50 Specification for Library Applications and Resource Discovery, Release 2 (Draft 3,Oct. 2002) • Enables effective use of Z39.50 in a range of library applications: • Search and retrieval from library catalogues • Search and retrieval of bibliographic holdings • Search and retrieval of authority records • Cross-domainsearching FOR MORE INFORMATION, VISIT THE BATH MAINTENANCE AGENCY WEBSITE… http://www.nlc-bnc.ca/bath/ Netspeed 2002 -- Calgary, Alberta -- October 2002
Structure of the profile • Modular for extensibility • Related requirements and specifications group in Functional Areas • Release 2 defines four Functional Areas • Functional Area A: Basic Bibliographic Search and Retrieval, with Primary Focus on Library Catalogues • Functional Area B: Bibliographic Holdings Search and Retrieval • Functional Area C: Cross-Domain Search and Retrieval • Functional Area D: Authority Record Search and Retrieval in Online Library Catalogues • Defines Conformance Levels for each area Netspeed 2002 -- Calgary, Alberta -- October 2002
Addressing interoperability • The Bath Profile: • Identifies searching requirements (tasks) • Defines the searches (semantics and behavior) • Specifies Z39.50 query to represent the search • Standard combination of Z39.50 attribute types and values • Clients must send all attribute type values specified for search • Servers must be able to process all values • No default behavior by client or server • Requires support for specific formats for interchanging retrieval records Netspeed 2002 -- Calgary, Alberta -- October 2002
Functional Area A, Level 0 • Conformance Level 0 • Version 2 required, Version 3 recommended • Basic Bibliographic Search (Z39.50 Search Service) • Author Search — Keyword • Title Search — Keyword • Subject Search — Keyword • Any Search — Keyword • Basic Bibliographic Retrieval (Z39.50 Present Service) • Z-clients to support MARC21 and SUTRS • Z-servers to support MARC 21 Netspeed 2002 -- Calgary, Alberta -- October 2002
Functional Area A, Level 1 • Conformance Level 1 • Inherits search requirements form Level 0 • Requires 15 additional searches, including: • Exact Match (author, title, subject) • First Words & First Characters in Field (author, title, subject) • Keyword with Right Truncation (author, title, subject) • Standard ID, Date, • Browse Indexes (Z39.50 Scan Service) • 3 scans defined • Retrieval • Z-clients to support MARC21 and SUTRS • Z-servers to support MARC 21 Netspeed 2002 -- Calgary, Alberta -- October 2002
Functional Areas B, C, D • Area B -- Holdings Information • Address the challenge of search and retrieval of bibliographic holdings information • Locations Only • Locations, Summary Information and Count if available • Summary Copy Level Holdings • Use of XML as Record Syntax • Area C -- Cross Domain Search/Retrieval • Defines two conformance levels (13 searches) • Dublin Core DTD for XML record syntax • Area D – Authority Record Search/Retrieval • Defines one conformance level • Defines 14 searches Netspeed 2002 -- Calgary, Alberta -- October 2002
Level 0: title keyword search Uses: Searches for complete word in a title of a resource. Example: Title search for “woman” represented in Z query as: (1,4)(2,3)(3,3)(4,2)(5,100)(6,1) woman Netspeed 2002 -- Calgary, Alberta -- October 2002
Level 0: title keyword right truncation Uses: Searches for complete word beginning with the specified character string in fields that contain a title of a resource. Example: Title search for woman truncated as “wom” represented in Z query as: (1,4)(2,3)(3,3)(4,2)(5,1)(6,1) wom Netspeed 2002 -- Calgary, Alberta -- October 2002
Level 1: title first words in field Uses: Searches for complete word(s) in the order specified in fields that contain a title of a resource. The field must begin with the specified character string. This search is useful when the beginning words in a title are known to the user. Example: Title search for “Gone with the” represented in Z query as: (1,4)(2,3)(3,1)(4,1)(5,2)(6,1) gone with the Netspeed 2002 -- Calgary, Alberta -- October 2002
Endorsements of Bath Profile • Atlantic Scholarly Information Network • CENL Working Group on Technical Standards • Czech and Slovak Library Information Network (CASLIN) • Committee on Institutional Cooperation (CIC) • International Coalition of Library Consortia (ICOLC) • Istituto Centrale per il Catalogo Unico delle Biblioteche Italiane e per le Informazioni Bibliografiche (ICCU) • M25 Consortium of Higher Education Libraries • National Library of Canada • OCLC • ONE2 • SmartLibrary • Standing Conference of National and University Libraries (SCONUL) • Z Texas Project Netspeed 2002 -- Calgary, Alberta -- October 2002
Bath as foundation profile • National, regional, and state profiles based on the Bath Profile • ONE-2 Profile • DanZIG Profile • U.S. National Z39.50 Profile • Z Texas Profile Netspeed 2002 -- Calgary, Alberta -- October 2002
Library application profiles • The Bath Profile: An InternationalZ39.50 Specification for Library Applications and Resource Discovery • U.S. National Z39.50 Profile for Library Applications • Z Texas Profile: A Z39.50 Profile for Library Systems Applications in Texas Relationship among profiles Bath Profile Core Specifications For Global Interoperability Netspeed 2002 -- Calgary, Alberta -- October 2002
U.S. National Profile • National Information Standards Organization (NISO) standards effort • National Profile: • Addresses cross-catalog searching and holdings information interchange • Bath Profile is foundation for U.S. National Profile • Responds to national requirements • Work initiated in November 2000 • Draft standard ready by end of 2002 FOR MORE INFORMATION, VISIT THE PROJECT WEBSITE… http://www.unt.edu/zprofile Netspeed 2002 -- Calgary, Alberta -- October 2002
U.S. Profile Functional Area A • Conformance Level 0 • Version 2 required, Version 3 recommended • Basic Bibliographic Search (Z39.50 Search Service) • Author Search — Keyword (NISO) • Title Search — Keyword (Bath) • Subject Search — Keyword (Bath) • Any Search — Keyword (Bath) • Basic Bibliographic Retrieval (Z39.50 Present Service) • MARC 21 supported by Z-client and Z-servers Netspeed 2002 -- Calgary, Alberta -- October 2002
U.S. Profile Functional Area A • Conformance Level 1 • Version 3 required • Inherits search requirements form Level 0 • Requires 20 additional searches, including: • Exact Match (author, title, subject) • First Words & First Characters in Field (author, title, subject) • Keyword with Right Truncation (author, title, subject) • ISBN, ISSN, Standard ID, Format/Type, Date, Language • Browse Indexes (Z39.50 Scan Service) • Retrieval • Z-clients support MARC 21 • Z-servers support MARC 21 Netspeed 2002 -- Calgary, Alberta -- October 2002
U.S. Profile Functional Area A • Conformance Level 2 • 38 additional searches, including • Key Title, Series Title, Uniform Title, • Unanchored phrase searches for Title, Subject, Name, Any • Personal Author, Corporate Author, Conference Meeting • Notes, other standard number (e.g., LCCN) • Pattern searches for one or more controlled vocabularies Netspeed 2002 -- Calgary, Alberta -- October 2002
U.S. Profile Functional Area B • Bibliographic Holdings Information Retrieval • Use of XML as Record Syntax • Z39.50 Holdings XML Schema http://www.portia.dk/zholdings/ • Harmonized with Bath Profile Netspeed 2002 -- Calgary, Alberta -- October 2002
Z39.50 profiles are not enough • Profiles can: • Identify searching requirements (tasks) • Define the searches (semantics and behavior) • Specify Z39.50 query to represent the search and formats of retrieval records • Also needed are: • Agreements on indexing • Common search functionality • Methods and testbed for interoperability testing • Conformance to profiles by vendors and libraries Netspeed 2002 -- Calgary, Alberta -- October 2002
Indexing & search functionality • Indexing • Access points • Populating indexes from which MARC fields/subfields • Moving toward community agreements on common indexing policies to support profile-defined searches • Indexing guidelines available for use http://www.unt.edu/zinterop/ • Related issues: word normalization, word extraction • Search functionality • Phrase searching • Truncation • Proximity searching, etc. Netspeed 2002 -- Calgary, Alberta -- October 2002
Interoperability testbed project Realizing the Vision of Networked Access to Library Resources: An Applied Research and Demonstration Project to Establish and Operate a Z39.50 Interoperability Testbed • A Institute of Museum and Library Services National Leadership Grant • Goal: Improve Z39.50 semantic interoperability among libraries for information access and resource sharing FOR MORE INFORMATION, VISIT THE PROJECT WEBSITE… http://www.unt.edu/zinterop/ Netspeed 2002 -- Calgary, Alberta -- October 2002
Z-Interop vision • Provide a technically and organizationally trusted environment for vendors and consumers to demonstrate and evaluate Z39.50 products • Develop rigorous methodologies, test scenarios & procedures to measure and assess the extent of interoperability • Demonstrate and operate a Z39.50 interoperability testbed Netspeed 2002 -- Calgary, Alberta -- October 2002
Z-Interop partners • Institute of Museum and Library Services • UNT’s Texas Center for Digital Knowledge • University of North Texas School of Library and Information Sciences • OCLC Online Computer Library Center • Sirsi Corporation • Sea Change Corporation, Bookwhere 2000 Netspeed 2002 -- Calgary, Alberta -- October 2002
Components of the testbed • Test dataset • 400,000 MARC 21 records from OCLC’s WorldCat • Z39.50 reference implementations • Z-client, Z-server, information retrieval system • Test scenarios & searches • Searches with known result records from dataset • Benchmarks • Results of test searches against reference implementations Netspeed 2002 -- Calgary, Alberta -- October 2002
Analysis of test dataset • Determine frequency of words in dataset • Systematically select words for use in test searches • Identify records that contain selected word • Aggregate Record Group • Word appears in any fields and subfields • Identify records that contain selected word in specified fields/subfields • Candidate Record Group • For example, examine records for occurrence of word in title-related fields/subfields Netspeed 2002 -- Calgary, Alberta -- October 2002
Decomposed MARC records 400,000 MARC21 records = 33 million decomposed records Netspeed 2002 -- Calgary, Alberta -- October 2002
Analysis logic 1. Examine for occurrence of word “river” 2. Yields Aggregate Record Group for word “river” Test Dataset (decomposed records) Aggregate Record Group 3. Examine for occurrence of word “river” in selectedfields/subfields Candidate Record Group 4. Yields Candidate Record Group for word “river” in selectedfields/subfields Netspeed 2002 -- Calgary, Alberta -- October 2002
Some critical questions • What is a “word” • Self-help • Self help • Normalization • Elena • Éléna • What are the appropriate Author, Title, and Subject fields to look in for the word? • Decision related to indexing policies Netspeed 2002 -- Calgary, Alberta -- October 2002
Reference implementations • Online Catalog Software • Z-Interop testbed uses SIRSI’s UNICORN system • Test dataset loaded on the system • Indexing policies based on guidelines • Z39.50 Server • SIRSI Z39.50 Module • Configured according to Bath/U.S. Profile • Z39.50 Client • Bookwhere 2000 • Configured according to Bath/ U.S. Profile Netspeed 2002 -- Calgary, Alberta -- October 2002
Establishing benchmarks Reference Z39.50 Client Reference Z39.50 Server Test Dataset Configuredto SupportProfileSpecifications Configuredto SupportProfileSpecifications Indexed perguidelines to supportProfile searches Test searches Benchmarks For Test Search Yields Compared to CandidateRecord Group RetrievalResults Netspeed 2002 -- Calgary, Alberta -- October 2002
Interoperability testing • Z-Interop Interoperability Testing Policies and Procedures • Test dataset loaded on participant’s system • Configured conform with Bath/U.S. Profiles • Indexed according to participant’s policies • Testing Z-servers • Z-Interop will send test searches from reference Z-client • Report results compared with benchmarks • Analyze results to assist implementor to improve interop • Testing Z-clients • Test searches sent to reference Z-server Netspeed 2002 -- Calgary, Alberta -- October 2002
Testing & assessment Test Dataset Loaded by Vendor or Library Reference Z39.50 Client VendorZ39.50 Server Configuredby Vendorfor Conformance to Profile Configuredto SupportProfileSpecifications Indexed by Vendor According to Vendor’s Specifications Test Searches Benchmarks For Test Search RetrievalResults Compared to Netspeed 2002 -- Calgary, Alberta -- October 2002
Current testing • Validate testing methodologies, procedures, policies • Bath/U.S. National Profiles Levels 0 & 1 Search & Retrieval • Title Search – Keyword • Author Search – Keyword • Subject Search – Keyword • Any Search – Keyword • Title, Author, Subject Searches – Keyword Right Truncation • Simple Keyword Boolean searches (AND, OR, NOT) • Test participants • InQuirion • OCLC • Innovative Interfaces • TLC/CARL • epixtech • Fretwell-Dowing • M 25 (UK) • Others expressing interest Netspeed 2002 -- Calgary, Alberta -- October 2002
Research questions • What are acceptable levels of interoperability? • What are appropriate measures of interoperability? • What does conformance to a Profile mean? • Conformance of vendor’s product • Conformance of your implementation of vendor’s product • To what extent are organizations willing to support common indexing practices to improve interoperability? Netspeed 2002 -- Calgary, Alberta -- October 2002
Critical success factors • Openness and transparency of processes • Project documents available on website • Culture of nurturing improvement • Trustworthiness • Confidentiality of participants’ results Netspeed 2002 -- Calgary, Alberta -- October 2002
An opportunity for Z39.50 • Z39.50 experience has shown the challenges of interoperability • Problems of interoperability are better understood within a focal community • Solution paths exist • Interoperability testing serves as platform for improvement • The pieces are finally falling into place! Netspeed 2002 -- Calgary, Alberta -- October 2002
References • The Bath Profile Maintenance Agency • http://www.nlc-bnc.ca/bath/ • U.S. National Profile • http://www.unt.edu/zprofile/ • Z39.50 Interoperability Testbed • http://www.unt.edu/zinterop/ Netspeed 2002 -- Calgary, Alberta -- October 2002