650 likes | 838 Views
Conceptual foundations for semantic mapping and semantic search. Dagobert Soergel Department of Library and Information Studies, University at Buffalo. Cologne Conference on Interoperability and Semantics in Knowledge Organization Cologne University of Applied Sciences
E N D
Conceptual foundations for semantic mapping and semantic search Dagobert Soergel Department of Library and Information Studies, University at Buffalo Cologne Conference on Interoperability and Semantics in Knowledge OrganizationCologne University of Applied Sciences Institute of Information Management (IIM)July 19, 2010
Mapping through a Hub Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports Hub Water transport Inland water transport Ocean transport Traffic station ⊓ Water transport Traffic station ⊓ Inland water tr. Traffic station ⊓ Ocean transport LCSH Shipping Inland water transport Merchant marine Harbors German Hafen 2
Outline • Objective: Interoperability Plus • KOS concept hub: canonical expressions • Examples: Knowledge base and applications • ImplementationCanonical expressions local, hub globalKnowledge-based, computer-assisted creation of canonical expressions to represent concepts.Crowdsourcing • Cross-language mapping and shades of meaning • Conclusion
Objective Improve semantic-based search across multiple collections in multiple languages. • Interoperability between any two participating KOS(Knowledge Organization Systems) • Support for search, esp. facet-based search • for any collection indexed by a participating KOS • for search based on free-text or free-form social tagging • Assistance in cataloging (metadata creation) by catalogers or users (social tagging) • Long-range goal: Web service where a KOS can be uploaded and mappings to specified target KOS are returned
KOS Concept Hub • Interoperability is achieved by representing concepts from all participating KOS through canonical expressions, such as a description logic formula using atomic concepts and relationships • The backbone of the proposed system is an extensible faceted core classification of atomic concepts together with a set of relationships • Mapping from KOS to KOS is achieved by reasoning over these canonical expressions
Mapping through a Hub Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports Hub Water transport Inland water transport Ocean transport Traffic station ⊓ Water transport Traffic station ⊓ Inland water tr. Traffic station ⊓ Ocean transport LCSH Shipping Inland water transport Merchant marine Harbors German Hafen
Mapping through a Hub Dewey 387 Water, air, space transportation 386 Inland waterway & ferry transportation 387.5 Ocean transportation 386.8 Inland waterway tr. > Ports 387.1 Ports Hub Traffic station Vehicle parking Terminal facilities Water transport Inland water transport Ocean transport Traffic station ⊓ Water transport By type of water transport Traffic station ⊓ Inland water tr. Traffic station ⊓ Ocean transport By component of traffic station Vehicle parking ⊓ Water transport Terminal facilities ⊓ Water transport LCSH/AAT Shipping water transport Inland water transport Merchant marine Harbors ports harbors 7
Examples from theLibrary of Congress Classificationand theLibrary of Congress Subject
Examples from theLibrary of Congress Classificationand theLC Subject Headings
Mapping through a Hub LCC TL681.S6 Airplanes. Soundproofing VM367.S6 Submarines. Soundproofing Hub L17 Vehicles ⊓ L33 Air transport ⊓ R37 Soundproofing L17 Vehicles ⊓ L37 Water transport ⊓ R37 Soundproofing L17 Vehicles ⊓ L37 Water transport ⊓ R37 Soundproofing ⊓ T73 Military⊓ Underwater LCSH Aeroplanes-Soundproofing Ships-Soundproofing
Mapping user queries User query Free text Combination of elemental concepts through facets (guided query formulation) Controlled term(s) from a KOS, possibly found through browsing a KOS Hub Canonical form of query (DL formula) Final query (Enriched) free text query Query in terms of a KOS
Examples fromNALT, LCSH, DDC, and SWD • NALT National Agricultural Library Thesaurus • LCSH Library of Congress Subject Headings • DDC Dewey Decimal Classification • SWD Schlagwortnormdatei
Mapping through a Hub LCSH Air - pollution Laws and regulations Air – pollution - Laws and regulations Hub [isa] Condition [isConditionOf] Air [ca[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable Undesirable [isa] Legal rule [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable NALT Air pollution Laws and regulations Air pollution ANDLaws and regulations
Mapping through a Hub DDC 363.739 2 Air pollution 340 Law 344.046 342 Air pollution [Law] 363.739 26 Air pollution rights Hub [isa] Condition [isConditionOf] Air [ca[isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable [prop.] Undesirable [isa] Legal rule [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable [isa] International treaty [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable [isa] Rights [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable SWD Luftverschmutzung Gesetz ??? Übereinkommen über weiträumige grenzüberschreitende Luftverschmutzung Umweltzertifikat
Soil moisture vs. Soil water LCSH term Soil moisture [isa] Water [containedIn] Soil NALT term Soil water [isa] Water [containedIn] Soil Mapping LCSH ▬► NALT Soil moisture ▬► Soil water
Greenhouse gardening LCSH term Greenhouse gardening [isa] Gardening [inEnvironment] Greenhouse [inEnvironment] Home NALT terms Home gardening [isa] Gardening [inEnvironment] Home Greenhouse [isa] Greenhouse Mapping LCSH ▬► NALT Greenhouse gardening ▬► Home gardening AND Greenhouse
Salad greens LCSH term Salad greens [isa] Green leafy vegetable [usedFor] Salad NALT term Green leafy vegetables [isa] Green leafy vegetable Mapping LCSH ▬► NALT Salad greens ▬► BT Green leafy vegetables
Emerging diseases LCSH term Emerging infectious diseases [isa] Disease [hasProperty] Infectious [hasProperty] Emerging NALT term Emerging diseases [isa] Disease [hasProperty] Infectious ??? [hasProperty] Emerging Mapping LCSH ▬► NALT ??? Emerging infectious diseases ▬► Emerging diseases Emerging infectious diseases ▬► BT Emerging diseases
Mapping through a Hub DDC 331.4 Women workers Hub [isa] Worker [hasGender] Female [isa] Worker [hasGender] Female [hasStatus] Employee [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay[hasQualification] Unskilled [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] HourlyPay[hasQualification] Skilled [isa] Worker [hasGender] Female [hasStatus] Employee [hasPayStatus] Salaried [isa] Work BeingDone [executedBy] {Worker [hasGender] Female} SWD Arbeitnehmerin Arbeiterin Ungelernte Arbeiterin Hilfsarbeiterin Facharbeiterin Angestellte Frauenarbeit
Knowledge base for query formulation Physician = [isa] Worker [profLevel] Doctoral [domain] Medicine Oncologist = [isa] Worker [profLevel] Doctoral [domain] Oncology Ophthalmologist = [isa] Worker [profLevel] Doctoral [domain] Ophthalmology Physician ST Doctor Ophthalmologist ST Eye doctor Medicine BT Health care [isa] Worker [profLevel] Doctoral BT Professional Income ST Earnings Income NT Compensation Compensation ET Pay Compensation NT Wages Fee schedule [usedBy] {Insurance company [domain] Health care} <influences> Compensation [receivedBy] Physician
Mapping user queries User query Doctor's pay Hub Compensation [receivedBy] Physician Final query (Enriched) free text query See below [(Physician OR Doctor OR Oncologist OR Ophthalmologist OR (Professional AND (Medicine OR "Health care" OR Oncology OR Ophthalmology))) AND (Pay OR Earnings OR Compensation OR Wages OR Income)] OR [("fee schedule" OR fee) AND ("health insurance" OR "Blue Cross" OR Medicare OR Medicaid)]
Examples from the realm of AAT Taiwan AAT Art and Architecture Thesaurus (Getty) AAT Taiwan TELDAP, Institute for Information Science Academia Sinica TGM Thesaurus of Graphic Materials, Library of Congress E-HowNet A Lexical Knowledge Base for Semantic Composition, Academia Sinica
Mapping through a Hub TGM temples synagogues churches mosques Buddhist temples Taoist temples Hub Facility ⊓ Worship Facility ⊓ Worship ⊓ Judaism Facility ⊓ Worship ⊓ Christianity Facility ⊓ Worship ⊓ Islam Facility ⊓ Worship ⊓ Buddhism Facility ⊓ Worship ⊓ Taoism AAT/ Chinese temples (buildings) synagogues (buildings) churches (buildings) mosques (buildings) 禪寺 道觀
Mapping to Chinese • Use E-HowNet formal semantic expressions • Use terms that already exist in E-HowNet • Add terms using computer-assisted derivation of semantic expressions as described later for English
E-HowNet ontology 廣義知識知識本體 • Building| 建築物 Facilities |設施 Chinese Word: 廟 English: Temple Conceptual expression: {facilities |設施: domain = {religion |宗教}} Chinese Word: 禪寺 English: Buddhist temple Conceptual expression: {facilities |設施: domain = {Buddhist |佛教}} Chinese Word: 道觀 English: Taoist temple/ Taoist quan Conceptual expression: {facilities |設施: domain = {Taoism |道教}} 29
Examples of derivingcanonical expressions • Creating canonical expressions is key • Start out with some examples
Distributed implementation • Key principle:Canonical expressions can be created locally,The hub places each concept in a global structure • The person or algorithm producing canonical expressions need to know only the core classification. They need not know the structure of the often large KOS to be mapped
Distributed implementation • Ideally, use one central faceted classification of core concepts, but multiple mapped core classifications could be used • The central core classification is extensible and should continuously updated by many contributors • The central core classification must be able to express shades of meaning and, in the long run, usage information
Distributed implementation • A KOS could assign canonical expressions to its concepts − let's call this a semantically enhanced KOS or SEKOS • It is now a simple matter to map from any SEKOS to any other (somewhat dependent on the core classifications used)
Efficient creation of canonical expressions • Apply existing knowledge:Large knowledge base ▬► less effort for processing a new KOS • Use knowledge of KOS structure for hierarchical inheritance • Use linguistic analysis of terms and captions • Eliminate redundant atomic concepts • Check or produce mapping results from assignment of concepts to the same records • Get human editors’ input and verification where needed through a user-friendly interface. Crowdsourcing, one term at a time • KOS “owners” may verify and edit data pertaining to their KOS
Knowledge base Requires an ever larger classification and lexical knowledge base containing many kinds of data: • A faceted classification of atomic conceptsSeeded from sources with well-developed facets such as UDC the Alcohol and Other Drug (AOD) Thesaurus the Harvard Business Thesaurus the Art and Architecture Thesaurus various systems called ontologies
Knowledge base 2 Requires an ever larger classification and lexical knowledge base containing many kinds of data: 2. Linguistic knowledge bases such as WordNet, E-HowNet (Chinese), FrameNet, and mono-,bi-, and multi-lingual dictionaries and thesauri 3. Many KOS (Knowledge Organization Systems), such as LCC, UDC, DDC, DMOZ directory, LCSH, Schlagwortnormdatei ,MeSH and UMLS, AGROVOC, Gene Ontology 4. These will over time be fused into one large multilingual knowledge base with many terminological and translation relationships and relationships linking terms to concepts, with an increasing number of concepts semantically represented by a canonical expression. One database: Intellectual, not physical. Could be in Linked Data
Take-home message It is time to unify many disparate mapping efforts on a sound semantic footing
Dagobert Soergel dsoergel @ buffalo.edu www.dsoergel.com
3. To help students distinguish between'beat', 'earn', 'gain' & 'win'
Air pollution laws LCSH term Air – Pollution – Laws and regulations [isa] Legal rule [appliedTo] {[isa] Condition [isConditionOf] Air [causedBy] Pollutant [property] Undesirable} NALT terms Air pollution [isa] Condition [isConditionOf] Air [causedBy] Pollutant [prop.] Undesirable Laws and regulations [isa] Legal rule Mapping LCSH ▬► NALT Air – Pollution – Laws and regulations ▬► Air pollution AND Laws and regulations Interpretation for indexing and searching in both directions
Means Create a comprehensive knowledge base relating many classification schemes and subject heading lists used in libraries and in other contexts (LCC, DDC, DMOZ directory, LCSH, European schemes). Use combinations of atomic concepts taken from a well-structured underlying faceted classification to represent the meaning of classes and subject headings. • This project will achieve the following • Interoperability between any two participating Knowledge Organization Systems (KOS) (to the extent the two schemes allow) • Facet-based search • for any collection indexed by a participating KOS • for free-text search • Assistance in cataloging (metadata creation) by catalogers or users (social tagging) • Long-range goal: Web service where a KOS can be uploaded and mappings to specified target KOS are returned