170 likes | 258 Views
Integrating ontological and linguistic knowledge for Conceptual Information Extraction. Roberto Basili, Michele Vindigni, Fabio Massimo Zanzotto Università di Roma “Tor Vergata” Italy. professor. course. teacherOf. professor(“XYZ”) course(“Database Theory”) teacherOf(
E N D
Integrating ontological and linguistic knowledge for Conceptual Information Extraction Roberto Basili, Michele Vindigni, Fabio Massimo Zanzotto Università di Roma “Tor Vergata” Italy
professor course teacherOf professor(“XYZ”) course(“Database Theory”) teacherOf( professor(“XYZ”), course(“Database Theory”)) Motivation
Motivation • Linguistic interfaces are relevant for helping final users to deal with standardised conceptualizations (or ontologies) • Information Extraction systems may be used for this purpose. These require conceptualisations with: • high domain specificity • high coverage • High coverage and domain specific conceptualisations are difficult build, reuse of pre-existing knowledge is a must.
fiber None of the dendrites were cut. ...but the dendrites and axons are often cut. Researchers don't know why, but for some reason, the ends of dendrites tangle and knot. nerve fiber dendrite Reusing conceptualisations... Let us then take a Domain Concept Hierarchy: e.g. Medical Subject Headings (MESH) and a concept:Dendrite Dendrite Neuron Nervous System Dendrite Cell Surface Extension Cellular Structure Cell Dendrite Neuron Cell
Target Problem • Integration of linguistic information (Lexical Knowledge Base, LKB) with domain knowledge (Domain Concept Hierarchy, DCH) • Need to harmonise linguistic processing (i.e. feature detection in text) with available resource (DCH) • Need to annotate texts with semantic information, that is build a linguistic interface to DCH
Target problem Domain Concept Hierarchy Lexical Knowledge Base
Inspiring Principles (P1)Extensional Nature of Domain Concept Hierarchy Subsumption in the DCH has an extensional interpretation in the LKB a4 a5 a1 a2 a3 Domain Concept Hierarchy Lexical Knowledge Base
Inspiring Principles (P2) Intensional Strength in Lexical Knowledge Base Given a set of words W whose senses are subsumed by a in LKB, the intensional strength measures the trade-off between • the generalization required to model all the words • the capability of separating individual word senses in W
Mean Tree Area of n Words CD = Actual Tree Area Word1 Word2 Word3 Intensional Strength in LKBConceptual Density 1 3 2 6 4 5 15 9 8 7 10 11 12 13 14
Mapping Algorithm Preliminary definitions Extension of C ext(C)={tc’ in DCH|c subsumes C’ in DCH} Linguistic Generalisation of C lgen(C)={a in LKB|t in ext(C) and at is subsumed by a in LKB}
P1 (Extensional Nature of DCH) P2 (Intensional Strength of LKB) Mapping Algorithm merge(DCH,LKB,T) CT Step 1 Determine the linguistic extensions lgen(C) in DCH made of all descendants of C Step 2 Compute the optimal mapping G(C)lgen(C), by a greedy selection maximizing the conceptual density Step 3 Attach tC to senses in G(C) Step 4text(C) Attach t to LKB iff: is a sense for t in LKB and G(C) | subsumes in LKB
C=t ext(t)={t1,t2,t3,t4} lgen(t) ={ ,..., } t G(t)={a5,a4} t1 t2 t3 t4 Mapping Algorithm Step 1 Domain Concept Hierarchy a6 a4 a5 Step 2 a1 a2 Conceptual Density a3 Lexical Knowledge Base
t t1 t2 t3 t4 Mapping Algorithm Step 3 Domain Concept Hierarchy Attach tC to senses in G(C) a6 a4 a5 Step 4 a1 a2 a3 text(C) Attach t to LKB Lexical Knowledge Base
A case study: mapping MeSH in WordNet • Medical Subject Headings (MeSH) as Domain Concept Hierarchy • WordNet as Lexical Knowledge Base
Summary • Target Probem: Mapping a DCH in a LKB for Information Extraction • Solution: • Inspiring Principles: • Extensional Nature of DCH • Intesional Streght in LKB • The notion of conceptual density (Agirre & Rigau, 1996) • A novel mapping algorithm between DCH and LKB • A case study: MeSH in WordNet
Conclusions and future work • If a Domain Concept Hierarchy is available, the presented method is a viable solution to integrate it in WordNet. But, what when it is not available? How to learn taxonomical relations from text collections? Moreover, how to induce different kind of relations between concepts?