1 / 51

Increasing Interoperability on Searching Library Collections

Increasing Interoperability on Searching Library Collections. Sarantos Kapidakis & Michalis Sfakakis. Laboratory on Digital Libraries and Electronic Publishing Archive and Library Sciences Department Ionian University, Corfu, Greece sarantos@ionio.gr. University of Cyprus

yannis
Download Presentation

Increasing Interoperability on Searching Library Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Increasing Interoperability on Searching Library Collections Sarantos Kapidakis & Michalis Sfakakis Laboratory on Digital Libraries and Electronic Publishing Archive and Library Sciences Department Ionian University, Corfu, Greece sarantos@ionio.gr University of Cyprus February 16, 2011

  2. There are many sources of Information around The goal: How can we get the right results from all of them during a search? The challenge: Interoperability of the Heterogeneous Independent Sources The obvious solution: To use Standards and / or a common approach To create good practice guides, and use common approaches to semantics and configurable or optional parts of the standards The General Problem

  3. There are many applications of searching: Web, Data Bases, Library Catalogs, Repositories, … Libraries describe different objects: Books, journals, CDs, Videos, pictures, paintings, … Libraries work on interoperability of their catalogues for many decades They use standards and common approaches Like MARC21, UNIMARC, UKMARC, … Like Z39.50 with Bib1 profile for metadata Application on Library Catalogues

  4. Variety in contents and query systems library catalogues, bibliographic and full text databases, repositories, typical search engines, etc Huge number of available information sources conforms to the Z39.50 protocol The Z39.50 protocol is a typical case of query interface with abstract Access Points Meta-searching the Library Community

  5. LibrarySearching Model On some fields only: Access Points • Using OPAC, in local system • [MARC example] • Using Web Gateway, mostly through Z39.50 • [Z39.50 - MARC example]

  6. Z39.50 is not a standard for description or exchange of information It is a standard for dissemination and includes: Negotiation of capabilities Agreement in data profile (e.g. BIB-1) Communication protocol Query types and capabilities Format of results Z39.50

  7. ANSI/NISO Z39.50-1995 Appendix 3,ATR: Attribute Sets, pages 81-83, define as such: Bib-1 Z39.50-attributeSet 1 Exp-1 Z39.50-attributeSet 2 Ext-1 Z39.50-attributeSet 3 CCL-1 Z39.50-attributeSet 4 GILS Z39.50-attributeSet 5 STAS Z39.50-attributeSet 6 Metadata Profiles in Z39.50

  8. Hasattributes in the following categories: Use Attributes (π.χ. Personal name) Relation Attributes (π.χ. less than) Position Attributes (π.χ. first in field) Structure Attributes (π.χ. phrase) Truncation Attributes (π.χ. Right Truncation) Completeness Attributes (π.χ. incomplete subfield) Bib-1: Z39.50-attributeSet 1

  9. BIB-1 Use Attributes Personal name 1 Corporate name 2 Conference name 3 Title 4 Title series 5 Title uniform 6 ISBN 7 ISSN 8 Thematic-number 1030 Material-type 1031 Doc-id 1032 Host-item 1033 Content-type 1034 Anywhere 1035 Author-Title-Subject 1036

  10. Model Abstract record-based view No direct access to the underlying data and query methods Query mechanism Predefined abstract Access Points combined with specific attributes (Attribute Sets) Query languages (query types) General conformance requirements Attribute Set Bib-1, query Type-1 recognized (not necessarily implemented) Z39.50 Search Model & Primitives

  11. The semantics of the Access Points are defined in the “Attribute Set BIB-1 (Z39.50-1995): Semantics” document Which represents consensus among the members of the Z39.50 Implementors Group (ZIG) Maintained as an official document of the Z39.50 Maintenance Agency Defines the semantics of the Access Points using the tag values of representative MARC bibliographic format fields Z39.50 Bib-1 Access Points Semantics

  12. Query The proceedings from the IEEE’s conferences and only these No IEEE’s technical reports, neither records with subject IEEE’s conferences, etc. Z39.50 sources Copac Academic & National Library Catalogue (UK)‏ Library of Congress (US)‏ University of Crete Library (GR) Best Z39.50 Bib-1 Access Point: Author-name-conference-1006 = {111, 411, 711, 811} Rarely offered for use from the search environments Example 1

  13. Query failures The Z39.50 source fails the query and returns a diagnostic message (e.g. MELVYL, COPAC)‏ Inconsistent answers The Z39.50 source substitutes arbitrarily the unsupported Access Point with a supported (e.g. Library of Congress)‏ Unknown answer derivation The user is not informed for the substitution of the unsupported Access Point Consequences from the unsupported Access Point

  14. Statistical figures from IndexData for the “Ten most commonly supported Access Points” based on: 2,869 world wide Z39.50 sources where 1,821 of them support the search service Indicate that: No single Access Point is universally supported by the sources The most commonly supported Access Points are: Title supported from 1,667 (91.54%) sources Subject supported from 1,634 (89.73%) sources Author supported from 1,629 (89.45%) sources How Often Unsupported Access Points Occur

  15. In a similar study we made in 24 academic Ζ39.50 sources in Greece There is only one Access Point that is supported by all sources, the Author (use attribute 1003) Subject Heading (use attribute 21) and Title (use attribute 4) are each supported by 23 different sources This situation in Greece seems better than the average one. The order of the supported Access Points is different. Unsupported Access Points in Greece

  16. To permit queries with only the common Access Points to all sources Restricts the search capabilities of the sources To ignore the sources that do not support the Access Point Restricts the available sources To leave the source to substitute the unsupported Access Point with a supported one Results to inconsistent, unpredictable answers Common Approaches

  17. Searching from the Environment“HEAL Link Search” • Restriction on the Access Points to only the common ones

  18. Searching from the Environment“η Αργώ”

  19. Searching from the Environment“Ζέφυρος”

  20. To substitute the unsupported Access Point with other supported Access Points, so that (preferably) identical or (otherwise) similar semantics are obeyed A different substitution may have to be done for each source The Challenge

  21. Information Integration Architectures deal with the problem of query rewriting Based on mapping rules between the global schema and the local schemas of the underlying sources No exploitation of the local schema semantics More room for optimization in our specific case Related Work

  22. An Access Point is considered as a subset of an other one, if the set of the data fields used to create the first is a subset of the set of the data fields used to create the second An example: Author-name = {100, 110, 111, 400, 410, 411, 700, 710, 711, 800, 810, 811} Author-name-personal={100, 400, 700, 800} The Access Point Author-name- personal is considered being a subset of the Author-name 111 411 711 811 100 110400 410700 710800 810 Access Points Subset Relationship Author-name Author-name-personal

  23. We represent the relationships between the Access Points with a directed graph G Vertices represent Access Points Arcs represent subset relationships <i, j> is an arc of the graph if and only if Access Point i is a subset of the Access Point j The Access Points Author-name and the Author-name-personal will be represented by two vertices of the graph and their subset relationship from the arc <Author-name-personal, Author-name> For the RDFS description: rdfs:Class maps to Access Points (Vertices)‏ rdfs:subClassOf maps to Access Points subset relationships Access Points Semantic Graph Specification & RDF schema

  24. rdfs:Class Metaschema rdfs:AccessPoint rdfs:subClassOf bib1:Any_1016 rdf:type Schema bib1:Name_1002 bib1:Author-name_1003 bib1:Name-conference_3 bib1:Name-corporate_2 bib1:Name-personal_1 bib1:Author-name-conference_1005 bib1:Author-name-personal_1004 bib1:Author-name-corporate_1006 mrc:f-600 mrc:f-611 mrc:f-111 mrc:f-711 mrc:f-100 mrc:f-610 mrc:f-700 mrc:f-710 mrc:f-411 mrc:f-110 mrc:f-811 mrc:f-400 mrc:f-800 mrc:f-410 mrc:f-810 A Sample of the RDFS Graph

  25. A Representative Sample of the RDFS Graph of the Access Points

  26. The RDFSGraph Including the Supported Access Points from the Library of Congress

  27. The RDFSGraph Including the Supported Access Points from theUniversity of Crete

  28. Two substitution policies (Broad, Narrow)‏ Produce the Minimal set (depends on the substitution policy)‏ Eliminates every Access Point which is an ancestor/descendant of anyone else This is the case when there are more than one ancestor/descendant path hierarchies containing a supported Access Point, while the selected Access Point from one path is also a member of another path at a higher/lower level position than the selected AP from this path Finally, either the Boolean AND or OR combination of supported Access Points substitutes the unsupported Access Point Access Point Substitution

  29. Supported Access Point:COPAC Ζ39.50 Source

  30. Broad Access Point Substitution:Library of Congress Ζ39.50 Source

  31. Broad Access Point Substitution:University of Crete Ζ39.50 Source

  32. 611 600 610100 110400 410700 710800 810 111411711811 Comparing the semantics of the results – I

  33. The substitution for the Library of Congress produces equivalent results with the requested Access Point The answer has the same precision as the COPAC’s answer which supports the Access Point University of Crete We receive an answer with similar semantics (less precision)‏ The answer excludes records having as subject the conferences of the IEEE But still contains also other types of editions of the IEEE (e.g. standards)‏ Comparing the semantics of the results – II

  34. Not Desirable Record

  35. Query All metadata records containing the term "Malinowski" as either Author or Subject or in the Title Z39.50 source Library and Archives Canada Best Z39.50 Bib-1 Access Point: Author-Title-Subject-1036 Rarely offered for use from the search environments Example 2

  36. Narrow Access Point Substitution:Library & Archives Canada Ζ39.50 Source Selected Access Points • Title • Author-name-corporate • Author-name • Author-name-conference • Author-name-personal • Subject The Minimal Set • Title • Author-name • Subject

  37. The semantics of an Access Point are assigned from the parts of the record used to generate the Access Point (i.e. the leaf subclasses)‏ An Access Point has equivalent semantics with another Access Point or the union or intersection of a set of Access Points, if either: the sets of their underlying constitutional Access Points are equal, or the unions or the intersections of the sets of their underlying constitutional Access Points produce equal sets The semantic similarity of an Access Point with others is expressed mainly from its leaf subclasses Finally, the similarity among the semantics of the Access Points influences the result sets of the queries with the Access Points Access Points Semantic Similarity

  38. Broad substitution, increases the number of corresponding leaf (MARC) fields Decreases the precision Does not affect the recall Narrow substitution, decreases the number of corresponding leaf (MARC) fields Decreases the recall Does not affect the precision Substitution Policies Effects

  39. Characteristic extract leaf subclasses (lsc)‏ lsc(ap, O)={api| api C  api≤+ap xC: x≤api} Taxonomic Precision (tp) tp(aps, apr, O) = |lsc(aps)  lsc(apr)|/|lsc(aps)| Represents the proportion of the fields used into the requested Access Point apr (relevant fields) out of the fields used into the selected Access Point for the substitution aps (searched fields)‏ Taxonomic Recall (tr)‏ tr(aps, apr, O) = |lsc(aps)  lsc(apr)|/|lsc(apr)| Represents the proportion of the fields used into the selected Access Point for the substitution aps out of the fields used into the requested Access Point apr Similarity Evaluation Measures

  40. Broad Substitution lsc(apr)lsc(aps) tp(aps, apr, O) = |lsc(apr)|/|lsc(aps)| (simplified form)‏ tp(apsi, apr, O) = |lsc(apr)|/|lsc(apsi)| Narrow Substitution lsc(aps)lsc(apr)‏ tr(aps, apr, O) = |lsc(aps)|/|lsc(apr)| (simplified form)‏ tp(apsi, apr, O) = |lsc(apr)|/|lsc(apsi)| Similarity Evaluation Measures

  41. tp, tr, Values Sample

  42. Bib-1 source configuration Bib-1 source configuration Z39.50query Source 1 Source 1 … … Source n Source n ICS-FORTH RDFSuite query for source 1 query for source n RQL / RSSDB Bib-1RDFS z-request source 1 z-request source n z39.50SemanticAccessPointNetworkSystem Architecture & Substitution Process Access PointSubstitution Module … Z39.50 module / PHPYAZ …

  43. Attacks the problem of the unsupported Access Points in the context of the Z39.50 and for the Bib-1 attribute set Substitutes the unsupported Access Point with the union or the intersection of other supported The substitution exploits the semantics of the Access Points from an RDFS description Broadens or Narrows the semantics of the unsupported Access Point according to the user preferences zSAPN is available as a free service at: http://dlib.ionio.gr/zSAPN z39.50SemanticAccessPointNetwork:A system for Semantic-Based Access Point Substitution

  44. Using zSAPN – Starting a Session

  45. Using zSAPN – Requesting a Broad Substitution

  46. Results from zSAPN – Broad Substitution

  47. Results from zSAPN – No Substitution

  48. Ζ39.50 source No Substitution BroadSubstitution COPAC 2799 2799 Library of Congress 8312 1790equivalent semantics University of Crete Error: Unsupported Use attribute 349 similar semantics (less precision) Query Results Comparison • Query: Author-name-conference_1006 = IEEE • Narrow substitution is not feasible

  49. Semantics based substitutions could really improve the effects form the unsupported Access Points when meta-searching metadata repositories behind query interfaces zSAPN, currently in the Z39.50 context, improves the search consistency and eliminates query failures exploiting the semantic information of the Access Points from an RDFS description zSAPN substitutes the unsupported Access Point with a set of others whose proper combination either broadens or narrows the semantics of the unsupported Access Point, while evaluates the modification on the precision or the recall for the original query respectively Conclusions - I

  50. The proposed substitution policies enable any mediator to decide how to modify, if it is necessary, the semantics of an unsupported query prior to initiating the search requests. A source using the zSAPN underlying methodology could expand its functionality instead of making arbitrary or general substitutions The RDFS description of the Bib-1 Access Points could be a basis for the deployment of the library community primitive search semantics to the Semantic Web zSAPN is a free service at the Laboratory on Digital Libraries and Electronic Publishing of the Archive and Library Sciences Department of the Ionian University http://dlib.ionio.gr/zSAPN Conclusions - II

More Related