400 likes | 582 Views
Overview. KYOTO as a domain implementation of the Global Wordnet GridScope of knowledge integrationDivision of linguistic laborHow to integrate resources?How to make inferences?. KYOTO ? some statistics. European-Asian projectMarch 2008 ? March 20117 countries (The Netherlands, Italy, Germany,
E N D
1. Division of semantic labor in the Global WordNet Grid Piek Vossen, VU University Amsterdam
German Rigau, University of the Basque Country
5th Global Wordnet Conference
Mumbai, India, Jan 30 – Feb 5, 2010
2. Overview KYOTO as a domain implementation of the Global Wordnet Grid
Scope of knowledge integration
Division of linguistic labor
How to integrate resources?
How to make inferences?
3. KYOTO – some statistics European-Asian project
March 2008 – March 2011
7 countries (The Netherlands, Italy, Germany, Spain, Taiwan, Japan, Czech Republic)
12 sites
Universities & research institutes: VUA, CNR-ILC, CNR-IIT, BBAW, EHU, AS, NICT, Masaryk
Companies: Synthema, Irion
User organizations: ECNC, WWF
7 languages (English, Italian, Japanese, Dutch, Spanish, Basque, Chinese)
4. Overview of the KYOTO process
5. GWC2010, Mumbai 5 Applying ontology mappings
6. GWC2010, Mumbai 6 Gobal Wordnet Grid
7. GWC2010, Mumbai 7 Available repositories in KYOTOEnvironment domain Term database: 500,000 terms per 1,000 documents per language
Open data project:
DBPedia: 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples)
GeoNames: 8 million geographical names and consists of 6.5 million unique features whereof 2.2 million populated places and 1.8 million alternate names
Domain thesauri and taxonomies: Species 2000: 2,1 million species
Wordnets for 7 languages: about 50,000 to 120,000 synsets per language
Ontologies: SUMO, DOLCE, SIMPLE
8. GWC2010, Mumbai 8 Here is the architecture of the kyoto grid or knowldege base, where we have the wordnets, connected to the ontology. The nebula in the center represents the sense axis which groups together different corresponding each synsets and via them points to the ontology. This ia the initial situation. At the end we will have the lex and onto bases extended for the environmental domainHere is the architecture of the kyoto grid or knowldege base, where we have the wordnets, connected to the ontology. The nebula in the center represents the sense axis which groups together different corresponding each synsets and via them points to the ontology. This ia the initial situation. At the end we will have the lex and onto bases extended for the environmental domain
9. GWC2010, Mumbai 9 Species in the ontology
10. GWC2010, Mumbai 10 Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing with current reasoners
Vocabularies are linguistically too diverse to be represented in an ontology
Inferencing capabilities of formal ontologies is not needed for all levels of knowledge
11. GWC2010, Mumbai 11 Modeling knowledge in a domain Knowledge needs to be divided over different lexical and ontological layers:
Precisely define the relations between lexical and ontological layers
Precisely define the inferencing based on the distributed knowledge layers
12. GWC2010, Mumbai 12 Division of linguistic labor principle Putnam 1975:
No need to know all the necessary and sufficient properties to determine if something is "gold"
Assume that there is a way to determine these properties and that domain experts know how to recognize instances of these concepts.
Speakers can still use the word "gold" and communicate useful information
13. GWC2010, Mumbai 13 Division of semantic labor principle Digital version of Putnam (1975):
Computer does not need to have all the necessary and sufficient properties to determine if something is a "European tree frog"
Computer assumes that there is a way to determine this and that domain experts (people) know how to recognize instances of these concepts.
Computers can still reason with semantics and do useful stuff with textual data
14. GWC2010, Mumbai 14 What does the computer need to know? Distinction between rigid and non-rigid (Welty & Guarino 2002):
being a "cat" is essential to individual's existence and therefore rigid
being a "pet" is a temporarily role and therefore non-rigid; a cat can become a pet and stop being a pet without ceasing to exist
Felix is born as a cat and will always be a cat, but during some period Felix can become a pet and stop being a pet while he continuous to exist as a cat
All 2.1 million species are rigid concepts
15. GWC2010, Mumbai 15 What does the computer need to know? Roles and processes in documents have more information value than the defining properties of species:
Species defined in terms of physical properties already known to expert;
Roles such as "invasive species", "migration species", "threatened species" express THE important properties of instances of species
Roles are typically the terms we learn from the text not the species!
16. GWC2010, Mumbai 16 Wordnet-ontology-relations Rigid synset relations to ontology:
Synset:Endurant(Object); Synset:Perdurant(Event); Synset:Quality:
sc_equivalenceOf (= relation in WN-SUMO) or sc_subclassOf (+ relation in WN-SUMO)
Non-rigid synset relations to ontology:
Synset:Role; Synset:Endurant(Object); Synset:Perdurant(Event)
sc_domainOf: range of ontology types that restricts a role
sc_playRole: role that is being played
sc_participantOf: the process in wich the role is played
Rigidity can be detected automatically (Rudify, 80% precision, IAG 80%) and is stored in wordnets as attributes to synsets
17. Global Wordnet Grid Model
18. Global Wordnet Grid Model
19. Wordnet to ontology mappings {create, produce, make}Verb, English
-> sc_ equivalenceOf construction
{artifact, artefact}Noun, English
-> sc_domainOf physical_object
-> sc_playRole result-existence
-> sc_participantOf construction
{kunststof}Noun, Dutch // lit. artifact substance
-> sc_domainOf amount_of_matter
-> sc_playRole result-existence
-> sc_participantOf construction
20. Wordnet to ontology mappings {teacher}Noun, English
-> sc_domainOf human
-> sc_playRole done-by
-> sc_participantOf teach
{leraar}Noun, Dutch // lit. male teacher
-> sc_domainOf man
-> sc_playRole done-by
-> sc_participantOf teach
{lerares}Noun, Dutch // lit. female teacher
-> sc_domainOf woman
-> sc_playRole done-by
-> sc_participantOf teach
21. Wordnet-LMF <LexicalEntry id="footmark">
<Lemma writtenForm="footmark" partOfSpeech="n"/>
<Sense id="footmark_1" synset="eng-30-06645039-n">
<MonolingualExternalRefs>
<MonolingualExternalRef externalSystem="Wordnet3.0" externalReference="" />
</MonolingualExternalRefs>
</Sense>
</LexicalEntry>
<Synset/>
<SenseAxis/>
<SenseAxis id="sa_ita16-eng30_001" relType="eq_synonym">
<Target ID="ita-16-1251-n" />
<Target ID="eng-30-13480848-n"/>
</SenseAxis>
22. WN-LMF Synset relations <Synset id="eng-30-06645039-n" baseConcept="0"> <!-- footprint -->
<Definition gloss="mark of a foot or shoe on a surface">
<Statement example="the police made casts of the footprints in the soft earth outside the window" />
</Definition>
<OntologicalMetaProperties rigidValue=”1”>
<rigid score=”0.57” author=”Rudify1.0” date="2008-07-01">
<non-rigid score=”0.09” author=”Rudify1.0” date="2008-07-01">
</OntologicalMetaProperties>
<SynsetRelations/>
<MonolingualExternalRefs>
<MonolingualExternalRef externalSystem="SUMO" reference="superficialPart" relType="at"/>
<MonolingualExternalRef externalSystem="KYO" reference="mark" relType="sc_subclassOf"/>
</MonolingualExternalRefs>
</Synset>
23. WN-LMF Synset relations <Synset id="eng-30-02356039-n" baseConcept="0"> <!-- migration bird -->
<Definition gloss="birds that migrate in winter to warmer regions"/>
<OntologicalMetaProperties rigidValue=”0”>
<rigid score=”0.00” author=”Rudify1.0” date="2008-07-01">
<non-rigid score=”0.69” author=”Rudify1.0” date="2008-07-01">
</OntologicalMetaProperties>
<SynsetRelations/>
<MonolingualExternalRefs>
<Statement>
<MonolingualExternalRef externalSystem="KYO" reference="bird" relType="sc_domainOf"/>
<MonolingualExternalRef externalSystem="KYO" reference="done-by" relType="sc_playRole"/>
<MonolingualExternalRef externalSystem="KYO" reference="migration" relType="sc_participantOf"/>
</Statement>
</MonolingualExternalRefs>
</Synset>
24. GWC2010, Mumbai 24 Division of labor in knowledge sources
25. GWC2010, Mumbai 25 How to make inferences? Sparql queries to large Virtuoso databases: Aligned Species 2000, DBPedia
Sql queries to term database
Graph matching on wordnets stored in DebVisDic
Reasoning on a small ontology
26. KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 26 Ontotagger applied to KAF Apply WSD to every term in the KAF representation of a text
For each term in KAF representation of a text:
If wordnet synset (WSD) then check for ontology mappings, if none traverse wordnet hierarchy to find first mapping
Else check the SKOS database for wordnet mapping, if necessary traverse broader relations up to the first wordnet mapping and go to a.)
Else check the term database for wordnet mappings, if necessary traverse parent relations up to the first wordnet mapping and go to a.)
Collect all mappings from the ontology and all (relevant) ontological implications and insert them into the KAF representation of the text.
27. KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 27 Examples Migration birds in the Humber Estuary.
The migration of birds to the Humber Estuary
Bird migration in the Humber Estuary
Birds that migrate to the Humber Estuary
28. Annotation of ontological implications in KAF <!-- Migration birds in the Humber Estuary -->
<term lemma=“migration bird” pos=”N.pl”>
<externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings -->
<externalRef resource=“ontology" relation=”sc_domainOf” reference=“bird"/>
<externalRef resource=“ontology" relation=“sc_participantOf” reference=“migration"/>
<externalRef resource=“ontology” relation=“sc_playRole" reference=“done-by"/>
<externalRef resource=“ontology" relation=“implied” reference=“ done-by" some=”physical-plurality”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-destination" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-source" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-path" some=”particular”/>
</externalReference>
</term>
<term lemma=”in” pos=”P”/>
<term lemma = “Humber Estuary”><externalRef resource=“ontology” reference=“location"/>
29. <!-- Bird migration in the Humber Estuary -->
<term lemma=“bird” pos = “N.pl”>
<externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings -->
<externalRef resource=“ontology" relation=”sc_equivalentOf” reference=“bird"/>
</externalReference>
</term>
<term lemma=“migration” pos=”N”>
<externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings -->
<externalRef resource=“ontology" relation=“sc_equivalentOf” reference=“migration"/>
<externalRef resource=“ontology" relation=“implied” reference=“ done-by" some=”physical-plurality”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-destination" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-source" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-path" some=”particular”/>
</externalReference>
</term>
<term lemma=”in”/>
<term lemma = “Humber Estuary”><externalRef resource=“ontology” reference=“location"/> Annotation of ontological implications in KAF
30. <!-- Birds that migrate to the Humber Estuary -->
<term lemma=“bird” pos=”N.pl”>
<externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings -->
<externalRef resource=“ontology" relation=”sc_equivalentOf” reference=“bird"/>
</externalReference>
</term>
<term lemma=“migrate” pos=”V”>
<externalReference> <!-- Tagging terms with ontological implications based on wordnet mappings -->
<externalRef resource=“ontology" relation=“sc_equivalentOf” reference=“migration"/>
<externalRef resource=“ontology" relation=“implied” reference=“ done-by" some=”physical-plurality”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-destination" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-source" some=”particular”/>
<externalRef resource=“ontology" relation=“implied” reference=“ has-path" some=”particular”/>
</externalReference>
</term>
<term lemma=”to”/>
<term lemma = “Humber Estuary”><externalRef resource=“ontology” reference=“location"/> Annotation of ontological implications in KAF
31. KYOTO Project meeting, Jan 13-14th 2010, PolyU Hong Kong 31 Kybot profiles IF <! bird migration to HE>
T1 + to + T2 &
T1.impliedType="change_of_location" & T1.impliedRole="has-target" &
T2.Type="location"
THEN
<location-target, T1, T2>
IF <! species migration from HE>
T1 + from + T2 &
T1.impliedType="change_of_location" & T1.impliedRole="has-source" &
T2.Type="location"
THEN
<location-source, T1, T2>
32. Kybot Knowledge Patterns <events>
<event eid="e1" target="t2" lemma="feed" pos="V" tense="PAST"
aspect="NONE" polarity="POS"/>
<event eid="e2" target="t20" lemma="migrate" pos="V" tense="PRESENT"
aspect="NONE" polarity="POS"/>
<role rid="r1" event="e1" target="t1" rtype="agent"/>
<role rid="r2" event="e1" target="t3" rtype="patient"/>
<role rid="r3" event="e1" target="t9" rtype="theme"/>
<role rid="r3" event="e2" target="t21" rtype="agent"/>
<role rid="r4" event="e2" target="t22" rtype="source"/>
<role rid="r5" event="e2" target="t24" rtype="goal"/>
</events>
33. GWC2010, Mumbai 33 Conclusion: Should all knowledge be stored in the central ontology? Vocabularies are too large for full inferencing
Vocabularies are linguistically too diverse to be represented in an ontology
Inferencing capabilities of formal ontologies is not needed for all levels of knowledge
A model of division of labor (along the lines of Putnam 1975) in which knowledge is stored in 3 layers:
SKOS vocabularies and term databases
wordnet (WN-LMF)
ontology (OWL-DL),
Each layer supports different types of inferencing ranging from Sparql queries, graph algorithms to reasoning.
Mapping relations that support the division of labour and different types of inferencing and that allow for the encoding of language-specific lexicalizations and restrictions.
34. Conclusions Ontologies are abstract and minimal and lexicons are large and rich
Semantic relations in lexicons are complementary to ontological relations
Semantic relations expressed in a language system should be compatible with ontologies
Large vocabularies of types (rigid things in the world) can be mapped to the ontology through combinations of lexical relations and basic ontological mappings
Lexicalizations of contextual and subjective concepts need to be expressed through more complex relations
Equivalences across languages partially through ontological expressions and partially across lexicons
35. Applying WSD to terms
36. GWC2010, Mumbai 36 How to integrate the data? Species 2000 vocabulary: 2,171,281 concepts in MySql database with parent relations:
Kingdom -> Class -> Order -> Family -> Genus -> Species -> Infra species
Animalia -> Chordata -> Amphibia -> Anura -> Leptodactylidae -> Eleutherodactylus -> Eleutherodactylus augusti
Converted to SKOS format
Aligned with DBPedia for language labels
Aligned with Wordnet using vocabulary and relation mappings
Published in Virtuoso, accessed with SPARQL queries
37. GWC2010, Mumbai 37 How to integrate data?Extending language labels using DBPedia
38. GWC2010, Mumbai 38 Vocabulary match with Wordnet synsets
If polysemous then SSI-Dijkstra weighting of senses based on the hyperonym chain
Results still to be evaluated:
Animalia (animal:1)-> Chordata (chordate:1) -> Amphibia (amphibian:3) -> Anura -> Leptodactylidae -> Eleutherodactylus -> Eleutherodactylus augusti (barking frog:1) How to integrate data?Alignment Species 2000 with wordnet
39. GWC2010, Mumbai 39 Word-sense-disambiguation is applied to terms in KAF (Kyoto Annotation Format)
Term hierarchy is extracted from KAF:
land:5
grassland:1 -> biome:1
woodland:1 -> biome:1
cropland
urban land
Results still to be evaluated: SemEval2010 How to integrate data?Alignment of terms with wordnet