250 likes | 383 Views
Complete and Consistent Annotation of WordNet with the Top Concept Ontology. Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra, Antoni Oliver and German Rigau Basque Country Univ., Pompeu Fabra Univ, (Barcelona), Open Univ. Of Catalonia (Barcelona). Introduction.
E N D
Complete and Consistent Annotation of WordNet with the Top Concept Ontology Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra, Antoni Oliver and German Rigau Basque Country Univ., Pompeu Fabra Univ, (Barcelona), Open Univ. Of Catalonia (Barcelona)
Introduction • 4 years work • Full annotation of WordNet’s Nouns with Semantic Features (EWN TCO) • Aimed to be an important semantic resource for NLP (selectional preferences, synset clustering, reasoning…).
Result • 65.989 noun concepts (synsets) = 116.364 noun lexemes (variants) consistently annotated • Average of 6.47 features per synset • Features organized in a multilevel hierarchy
Structure of the talk • Methodology • Examples and Discussion • Conclusions
Methodology • Annotation of the Inter Lingual Index(=EnWn1.6, SpaWN, mapping to other WNs...)with the nodes/features of the TCO(a shallow ontology defined in the EWN Project [Vossen et. Al 1998]) • Methodology based on: • INCOMPATIBILITY OF ONTOLOGICAL INFORMATION • SUBSUMPTION BLOCKAGE POINTS
The Top Concept Ontology • Organized in three orders of entities: • 1st Order (physical entities) • 2nd Order (situations) • 3rd Order (abstract entities)
The Top Concept Ontology • 1st Order entities organized in four Qualia-like features: • Origin (Artifact, Natural..) • Form (Object, Substance…) • Composition (Group, Part) • Function (Building, Container, Vehicle…)
The Top Concept Ontology • 2nd Order Entities organized in two dimensions • Situation Type: Dynamic (Bounded Events, Unbounded Events) & Static (Properties, Relations) • Situation Component: (Cause, Manner, Modal…) • 3rd Order Entities, no further subdivided
Methodology • We don’t modify the structure of neither the TCO nor WN (=> future work). We just annotate. • We declared pairs of TCO properties as incompatible (e.g.:natural vs. artifact, substance vs. object) • Initial annotation situation: In EWN, TCO features were manually assigned to a basic set of 1024 EWN synsets (= Base Concepts)
Methodology • We annotatedautomatically the rest of the Top Synsets (from the BCs up to the Top) using a Wordnet’s SemanticFile-TCO table of equivalence (e.g. NounAct <=> Agentive , NounAttribute <=> Property ) • We performed a full automatic top-down expansion of such information via the WN1.6 hierarchy (feature inheritance)
Methodology • This caused feature incompatibility to arise: • about 225.000 conflicts in 25.000 synsets • Causes: • Wrong manual annotation in EWN • Wrong TCO-SF equivalence • ... but basically: • Subsumption in WN not always work • ISA Overloading etc. • Multiple inheritance in WN
Methodology • We checked manually all feature incompatibilities in order to: • (i) adding and/or deleting ontological features • (ii) setting inheritance blockage points. • A blockage point is an annotation in WN1.6 which breaks the ISA relation between two synsets, thus no inheritance is allowed.
A simple example island city Java Bandung
A simple example island =NATURAL city =ARTIFACT Java Bandung
A simple example island =NATURAL city =ARTIFACT Java +NATURAL Bandung +NATURAL +ARTIFACT
A simple example island =NATURAL city =ARTIFACT Java +NATURAL Bandung +ARTIFACT
MethodologyInformation used for decision making • Relational information regarding every synset and neighbours; i.e. the WN structure • Synsets' glosses as provided by EWN • Glosses, descriptions and examples of the TCO features as provided in [Alonge et al. 1998] • Usual word-substitution tests to acknowledge hyponymy, as in [Cruse 1986]
Methodology • When all incompatibilities were fixed, a new automatic re-expansion was launched which resulted in a new (smaller) number of conflicts. • Following this iterative and incremental approach, inheritance was re-calculated and data are re-examined several times. • Task finished when a new cycle of re-expansion of properties did not result in new conflicts.
Methodology • Then, two final steps were applied: • Since the TCO is itself a hierarchy, for every synset, its annotation was expanded up-feature; e.g. Animal expands ot Living, Natural, Origin and 1stOrderEntity • The whole hierarchy was checked for consistencyusing formal Theorem Provers like Vampire and E-prover • This step resulted in a number of new conflicts which were finally fixed.
Typology of miscategorizations (IS-A Overload) (in black:[Guarino 1998] original typology) • Overgeneralization • Reduction of sense • Confusion of senses • Suspect Type-to-role relationship • Extensional ambiguity • 3rd Order Entities vs Mental 2nd Order Entities (TCO labels) • Technical inconsistencies
Typology of miscategorizations • Overgeneralitzation = Hypernym has more features than Hyponym should have • Reduction of Sense = Hypernym fails to capture part of the Hyponym’s meaning • Confusion of senses = Multiple inheritance where hypernyms are incompatible
Typology of miscategorizations • Extensional ambiguity = e.g. “layer”: is it an object or a substance? • 3rd Order Entities vs Mental 2nd Order Entities (TCO labels) = e.g “discipline” (process thus 2ndOrder) IS-A “knowledge domain” (3rdOrder) • Technical inconsistencies = e.g. Hyponymy-Meronymy confusion
Conclusions • WN1.6 (= ILI) fully and consistently annotated for Nouns with 60 semantic features organized in a shallow ontology • 65.000 synsets,116.000 variants • Average of 6.48 TCO features per synset • 350 inheritance-blocking points detected in WN • 28.000 synsets have at least one in their hypernymy chain [= they are affected by WN hierarchy mistakes or inadequacies] • The resource is free. It can be downloaded from our web site (vid. proceedings)
object =OBJECT abstraction =CONCEPT artifact +OBJECT shape +CONCEPT art +OBJECT figure +CONCEPT impressionism +OBJECT sculpture =IMAGE_REPRESENTATION +CONCEPT +OBJECT monument +OBJECT The Statue of Liberty +OBJECT +IMAGE_REPRESENTATION +CONCEPT
object =OBJECT abstraction =CONCEPT artifact +OBJECT shape +CONCEPT art =CONCEPT figure +CONCEPT impressionism +CONCEPT sculpture =IMAGE_REPRESENTATION =OBJECT monument +OBJECT The Statue of Liberty +OBJECT +IMAGE_REPRESENTATION