420 likes | 487 Views
Terminology mapping for subject cross-browsing in distributed information environments. Libo Si PhD student in the Department of Information Science, Loughborough University. Background. Users have to face different information resources using different schemes.
E N D
Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough University
Background • Users have to face different information resources using different schemes. • Library portal systems, such as MetaLib, SirSi Room. • These provide a single access point.
Background • Keyword cross-searching • Mapping different metadata schemes. • Make them interoperable. • Subject cross-browsing • Integrate different KOSs together into a hierarchical tree. • Issues • Interoperability between different knowledge organisation systems • Interoperability between metadata standards
My Research • Aim • To develop methods to facilitate both subject cross-browsing and cross-searching for library portal systems. • Objectives • To investigate different methods to develop cross-search service in a library portal product; • To investigate different methods to make different metadata standards interoperable; • To investigate different methods to make different knowledge organisation systems interoperable; • To indicate some trends to establish ontologies to facilitate both cross-searching and cross-browsing by subject for the development of library portal systems.
Methodology • Case study: HILT, Renardus, MetaNet, ABC Ontology, OpenCyc Ontology, ePrint UK, and UMLS. • Investigate different methods used by these projects to facilitate subject cross-browsing and cross-searching service.
Methods to cross-search (1) Federated Search (Sadeh 2006)
Methods to cross-search (1) “A cross-search service can create and maintain their own repository of resource metadata” (Sadeh 2004). • Issues: • Loss of data value • Cannot capture rich knowledge organisation systems used by different online databases due to the lack of methods to reuse different metadata schemes and controlled vocabularies (Hughs and Kamat 2005).
Methods to cross-search (2) • An alternative is … • In the semantic web community, the construction of ontologies to maximise the use of both subject classification systems and metadata schemes across different collections is possible. • Each participating resource providers can offer metadata and classification systems to any cross-search service.
Mapping semantics of different metadata standards • Derivation; • Application profile; • Crosswalk (one-to-one, and switch); • Metadata registry; • Data reuse and integration (RDF); • Aggregation. - Chen and Zheng (2006)
Derivation • One metadata scheme can be developed based on the principle and structure of an existing one (Chan and Zeng 2006a). • Ex.: TEI Lite is derived from the full Text Encoding Initiative (TEI).
Application profile • An application profile can be defined by combining a selected range of metadata elements from different metadata schemes for some application-specific purpose (Heery and Patel 2004).
Project using Application Profile • Five namespaces used by Renardus application profile http://renardus.sub.unigoettingen.de/renap/renap.html • Renardus Metadata Element Set (rmes), • Renardus Metadata Element Set Qualifiers (rmesq), • Dublin Core Metadata Element Set, version 1.1 (dc 1.1), • Dublin Core Metadata Element Set Qualifiers (dcterms), • DCMI Type Vocabulary (dcmitype).
Crosswalk • “A crosswalk is a specification for mapping one metadata standard to another” (St. Pierre and LaPlant 1998). • One-to-one • Many-to-many (switch scheme)
Metadata scheme registry • A metadata registry refers to an application that provides services based on information about 'metadata terms' and about related resources (Johnston 2005). • Ex: the CORES registry lists more than 40 metadata schemes, and supports searching and browsing by metadata scheme developer, maintenance agency, element sets, elements, encoding schemes, application profiles and element usages. • (http://www.cores-eu.net/registry/)
Data reuse and integration • This refers to describing information objects by using different elements from different metadata schemes or application profiles (Chan and Zeng 2006b). • The Resource Description Framework (RDF) provides a basic platform for integrating different metadata schemes to describe web resources (Heery and Patel 2004). • RDF can facilitate the use of different application profiles.
An RDF example <?xml version="1.0" ?> - <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc=http://purl.org/dc/elements/1.0/ xmlns:bc="http://www.schemas-forum.org/registry/schemas/BIBLINK/1.0/bc-ap#"> - <rdf:Description about="urn:isbn:0-89887-113-1"> <dc:title>Patrologia Latina Database</dc:title> <dc:creator>Jacques Paul Migne</dc:creator> <dc:date>1993</dc:date> <dc:language>la</dc:language> <bc:extent>2 computer laser optical disks; 4 3/4 in</bc:extent> <bc:systemRequirements>Multimedia PC 486x or higher, 8mb memory, CD-ROM drive, sound card, SVGA 256-colour monitor, Windows 95 or Windows 3.1</bc:systemRequirements> <dc:subject rdf:value="Christian literature, Early" bc:subjectScheme="LCSH" /> <dc:identifier rdf:value="isbn:0-89887-113-1" bc:identifierScheme="URN" /> <bc:placePublication>Cambridge</bc:placePublication> <dc:publisher>Chadwyck-Healey</dc:publisher> </rdf:Description> </rdf:RDF>
Aggregation • This refers to: • Employing a central knowledge base to gather metadata records from different online databases using different metadata standards • Converting heterogeneous metadata records into a consistent form • Developing a range of enhancement services to enrich the metadata records gathered.
Project using Aggregation - ePrint UK (Powell 2001)
Mapping semantics of different KOSs • Derivation • Direct mapping • Switch language • Merging • Co-occurrence mapping • Satellite and leaf node linking
Derivation • A subject-specific vocabulary is developed based on some widely-used general vocabularies. • Ex: MeSH was developed based on the structure of LCSH.
Direct mapping (Chan and Zeng 2004)
Switch language (Mai 2003)
Projects using a switch language • The HILT Project • Uses DDC as a switch language to navigate users to find relevant information. • The Renardus Project.
Co-occurrence mapping (Zeng and Chan 2004)
Merging • Different vocabularies in the same domain can be merged into a super-thesaurus. • Ex: The Unified Medical Language System (UMLS) merges concepts from about fifty medical controlled vocabularies into a metathesaurus.
Satellite and leaf node linking • Editors can select and adapt parts of a general vocabulary as a subject-specific vocabulary for some particular requirements. • Ex: A number of domain-specific controlled vocabularies have been developed by selecting parts of LCSH.
Ontology mapping for subject cross-search and browsing • Current efforts within the digital library community include developing ways to map different metadata schemes, and ways to map different knowledge organisation systems. • In the semantic web community, the ways to improve semantic interoperability include the construction of ontology and ontology mapping. • There is much in common between the methods used by these two communities.
What is an ontology? • Definition: An ontology is a formal (explicit) specification of a conceptualization shared by a community of people (R.Studer,1998). • The difference between an ontology and other knowledge organisation systems.
Types of ontologies in digital libraries • Upper level ontology • Domain ontology.
Upper level ontology • Refers to a common vocabulary including the basic concepts, such as things, space, events, time, behaviour, etc, and the relations between them (Gomez-Perez and Benjamins 1999; Ding and Foo 2004a). • Ex: OpenCyc, WordNet, and ABC ontology.
ABC Ontology • “It provides the notional basis for developing domain, role, or community specific ontologies, and it incorporates a number of basic entities and relationships common across other metadata ontologies including time and object modification, agency, places, concepts, and tangible objects. Communities wishing to build their own metadata ontologies and models may then extend the ABC entities and relationships as needed” (Lagoze and Hunter 2001). • ABC Ontology is designed to incorporate basic entities and relationships common across different metadata standards, and provide a basis to create metadata ontologies, into which different metadata schemes can be mapped.
OpenCyc Ontology • This is a universal ontology, in which "every concept one can imagine can be correctly linked into the OpenCyc Ontology in appropriate places, no matter how general or specific, no matter how arcane or prosaic, no matter what the context (nationality, age, native language, epoch, childhood experiences, current goals, etc.) of the imaginer" (Stubkjar 2001). • It provides a framework for further establishing custom, and domain-specific ontologies.
WordNet Ontology • This is a “manually constructed online lexical reference system” (Noy and Hafner 1997). In WordNet, different lexical objects are organised systematically with the basic distinction between nouns, verbs, adjectives, and adverbs. Nouns are grouped by different concepts, and different concepts are organised hierarchically. In WordNet, a verb is related to a concept’s function, and an adjective is related to a concept’s property. • The WordNet ontology is often applied to offer a taxonomic tree, and also support natural language processing.
Domain ontology • A domain-specific vocabulary that encompass the concepts in a given domain (such as medical, agriculture, computer science, etc) and their relationships (Gomez-Perez and Benjamins 1999; Uschold and Gruninger 1996; Guarino 1997). • In some cases, potentially, some traditional KOSs can be integrated together, and form a basis to create a domain ontology.
Use of ontologies • MetaNet: • Different metadata elements from different metadata schemes have been mapped to ABC ontology. • Mappings between E-learning object metadata and OpenCyc ontology • Mappings between MeSH and OpenCyc ontology • Mappings between different subject classification systems and OpenCyc
An Ontology Library System • “An ontology library system is a library system that offers various functions for managing, adapting and standardizing groups of different ontologies” (Ding and Fensel 2001). • To support searching and browsing different ontologies.
Conclusion (1) • A library portal system should be able to maximise the reuse of existing library resources, such as metadata schemes, and knowledge organisation systems. • In order to improve semantic interoperability, it is expected that each resource provider publishes metadata schemes, and knowledge organisation systems in semantic web enabled format to facilitate reusing these resources. • RDF, XML
Conclusion (2) • In order to facilitate cross-searching: • Develop or apply a common metadata scheme, into which different metadata elements from different metadata schemes can be mapped. • Different metadata schemes can also be mapped into an upper level ontology. • These two ways can be developed together.
Conclusion (3) • To facilitate cross browsing by subject • Different knowledge organisation systems can be mapped into a DDC as a subject navigation tree. • In order to support more powerful computational semantics, all concepts, intra-relationships, and inter-relationships in different knowledge organisation systems can be mapped into an upper level ontology.
Conclusion (4) • A variety of mappings have been developed. • Each type of mapping is designed to offer specific capabilities to improve semantic interoperability, and limited search or browsing functions. • A combination of the different types of mapping is required
Thank you and questions! Libo Si l.si@lboro.ac.uk