1 / 36

KYOTO (ICT - 211423) Overview

KYOTO ( ICT - 211423) Y ielding O ntologies for T ransition-Based O rganization FP7: Intelligent Content and Semantics http://www.kyoto-project.eu/ Piek Vossen Tienjarig jubileum NL-TERM, October 2008, Amsterdam. KYOTO (ICT - 211423) Overview.

nalani
Download Presentation

KYOTO (ICT - 211423) Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KYOTO (ICT-211423)Yielding Ontologies for Transition-Based OrganizationFP7: Intelligent Content and Semanticshttp://www.kyoto-project.eu/Piek VossenTienjarig jubileum NL-TERM,October 2008, Amsterdam

  2. KYOTO (ICT-211423) Overview • Title: Yielding Ontologies for Transition-Based Organization • Funded: • 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics • Taiwan and Japan funded by national grants • Goal: • Platform for knowledge sharing across languages and cultures • Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries. • Open text mining and deep semantic search • Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills • URL: http://www.kyoto-project.eu/ • Duration: • March 2008 – March 2011 • Effort: • 364 person months of work. Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  3. Consortium • Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), • Consiglio Nazionale delle Ricerche (Pisa, Italy), • Berlin-Brandenburg Academy of Sciences and Humantities (Berlin, Germany), • Euskal Herriko Unibertsitatea (San Sebastian, Spain), • Academia Sinica (Tapei, Taiwan), • National Institute of Information and Communications Technology (Kyoto, Japan), • Irion Technologies (Delft, The Netherlands), • Synthema (Rome, Italy), • European Centre for Nature Conservation (Tilburg, The Netherlands), • Subcontractors: • World Wide Fund for Nature (Zeist, The Netherlands), • Masaryk University (Brno, Czech) Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  4. KYOTO (ICT-211423) Overview • Languages: • English, Dutch, Italian, Spanish, Basque, Chinese, Japanese • Domain: • Environmental domain, BUT usable in any domain • Global: • Both European and non-European languages • Available: • Free: as open source system and data (GPL) • Future perspective: • Content standardization that supports world wide communication • Global Wordnet Grid -> database that interlinks all wordnets in the world to a shared ontology of meaning Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  5. chronisch zieke ; langdurig zieke psychisch/geestelijk zieke ρ-PATIENT HYPONYM genezen ρ-CAUSE behandelen HYPONYM ρ-AGENT ρ-PATIENT STATE kinderarts ρ-LOCATION ρ-PROCEDURE co-ρ- AGENT-PATIENT fysiotherapie medicijnen etc. ziekenhuis, etc. HYPONYM maagaandoening, nieraandoening, keelpijn kind Wordnet = network of semantic relations between words in a language zieke, patiënt arts, dokter ziekte, stoornis Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  6. Citizens Governors Companies Environmental organizations Environmental organizations Global Wordnet Grid Domain Wikyoto Capture Universal Ontology Wordnets  Concept Mining Docs Dialogue Top Abstract Physical Fact Mining Search URLs Process Substance Experts Middle water CO2 Index Kybots Tybots Images water pollution CO2 emission Sudden increase of CO2 emissions in 2008 in Europe Domain Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  7. qualifies type qualifies type Lexicon versus Ontology • Ecosystem services • Nature as a resource • Nature for waste absorption • State of nature • Threats to nature Ontology  Abstract Physical Artifacts Organism Element Process Spider Roof H20 CO2 Physical Change alien invasive species green house gas green roof species migration rural products ecosystem-based drinking water production sustainable products Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  8. Concepts & Facts • Conceptual knowledge: general & generic knowledge about • ClimateChange • physical change • affecting the climate => definition of climate • in a region • during a period of time • caused by another change • causing yet other changes Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  9. Concepts & Facts • Fact: • A case of ClimateChange has been observed: • factual and significant change in the climate (temperature, humidity, wind direction, rain fall, etc.) • in a particular region, e.g. the Alps. • Time period • Caused by CO2 emissions, North Atlantic gulf stream • Causes decrease of biodiversity measured in specific populations: fish, birds, insects => counts of populations Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  10. System architecture

  11. System components • Wikyoto = wiki environment for a social group: • to model the terms and concepts of a domain and agree on their meaning, within a group, across languages and cultures • to define the types of knowledge and facts of interest • Tybots = Term extracting robots, extract term data from text corpus • Kybots = Knowledge yielding robots, extract facts from a text corpus • Linguistic processors: • tokenizers, segmentizers, taggers, grammars • named entity recognition • word sense disambiguation • generate a layered text annotation in Kyoto Annotation Format (KAF) Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  12. Capture Server Document Base Linear KAF Concept User Tybot server (TermExtraction) Semantic Annotation Document Base Linear KAF ExtractedTerms Generic K-TMF KybotEditor Fact User Kybot Profiles Kybot Server (FactExtraction) TermEditor (Wikyoto) Document Base Linear Generic KAF Domain Wordnet K-LMF Domain Ontology OWL_DL Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  13. What Tybots do... • Input are text documents • “Green house gases, such as CO2” • “CO2 and other green house gases” • Linguistic processors generate KAF annotation (sequential): • morpho-syntactic analysis • semantic roles • named entities • wordnet and ontology mappings • Output are term hierarchies in TMF (generic): • structural parent relations: “CO2 is a green house gas is a gas” • quantified structural and semantic relations • statistical data • generalized semantic mappings Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  14. Generic algorithm • Extraction of a structural term hierarchy • Advantage: conceptual coherence • Steps: • extraction of potential terms using the morpho-syntactic structure • statistical selection of salient terms • conceptual selection of dominant terms • contextual selection of terms Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  15. Terms from morpho-syntactic structure • Words that are the syntactic head of an NP, e.g.: card, wing-player • Word combinations (excluding determiners and adverbs) that include the syntactic head, e.g.: yellow card, yellow card for wing-player. • The head of a compound: player as the head of wing-player, nameas the head ofusername. Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  16. Statistical extraction of terms • Frequency of terms by distribution over reference corpus: • Salience = normFreq * normRef • Where normFreq = normalized frequency of terms on the website and normRef = normalized count of website occurrence in the reference corpus: • normFreq = nTermFrequencynWords / nPages • normRef = 1-((nWebsitesnWords) / (referenceCorpusSize)) Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  17. Table 2: Salience filtering of terms Statistical extraction of terms Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  18. Structure to relation table for terms Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  19. English Wordnet Ontology Term hierarchy  location:3 substance:1 naturalprocess:1 Synthesize of Ontologize Abstract Physical region:3 area emission gas emission:3 Process Substance geographical area:1 area:1 gas:1 CO2 emission:2 greenhouse gas agricultural area Chemical Reaction H20 CO2 GreenhouseGas greenhouse gas:1 rural area:1 in CO2 GlobalWarming CO2Emission farmland:2 WaterPollution Axiomatize Conceptual modeling Source Documents [[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP TYBOT Concept Miners Linguistic Processors Morpho-syntactic analysis (instance s1 Substance) (instance e1 Warming) (katalyist s1 e1) Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  20. Wikyoto

  21. Facts in RDF Wordnets in LMF Ontologies in OWL-DL G-WN G-KON SUMO DOLCE GEO plugin plugin DE-WN DE-KON WIKIPEDIA FRAMENET pdf Simplified Term Fragment Simplified Ontology Fragment population Group ?Population marine species terrestrial species Interview Interview Do populations consist of marine species? Smart Kytext Are terrestrial species a type of populations? .... populations such as terrestrial and marine species ..... .... populations declined .....terrestrial and marine species.. in forests .....declined FactAF KAF Kybots Kyoto Server KAF Tybots DE-TN Hidden Shown A.. ... decline ... population ... ..Z Do populations always consist of marine species? Are terrestrial species never marine species? Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  22. Editing the domain wordnet

  23. G-WN: Synset: ENG20-07682918-n {population:2} a group of organisms of the same species populating a given area SUMO: +inhabits -> +Group Wiki: http://en.wikipedia.org/wiki/Population In sociology and biology a population is the collection of inter-breeding organisms of a particular species. group population WN & ⌐ DOC species population population of vertebrate species WN & DOC ⌐ WN & ⌐ DOC terrestrial species population marine species population ⌐ WN & DOC .... populations such as terrestrial and marine species ..... DE-WN people A.. ... decline ... population ... ..Z Are terrestrial species a type of populations? • 1. Validate Term Hierarchy: • Defining phrases: • document • domain corpus • Google • Other phrases • Wiki classes • Generic-WN classes Smart KyText Are terrestrial species never marine species? Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  24. Difficult wordnet mapping real property:1, real estate:1, realty:1 object:1, physical object:1 region:3 administrative district:1, administrative division:1, territorial division:1 land:1 land:2, ground:7, soil:3 land:3, dry land:1, earth:3,ground:1, solid ground:1, terra firma:1 domain:2, demesne:2, land:4 country:1, state:6, land:5 Wordnet & Doc biome:1 land Wordnet & ⌐ Doc ⌐Wordnet & Doc cropland urbanland grassland woodland ⌐Wordnet, ⌐Doc agricultural urban land mediterranean woodland Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  25. Editing the domain ontology

  26. Ontologization of terms • A domain term is a disjoint hyponym in the domain wordnet and is propagated to the domain ontology as a new Type. • A domain term is not a disjoint hyponym and therefore we do not propose a new ontology extension but we still need to map the term to the ontology, i.e. make the ontological constraint explicit. Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  27. = group Group + population ?Population species population population of vertebrate species Smart KyText .... populations of marine species ..... .... populations declined .....terrestrial and marine species.. in forests .....declined terrestrial species population marine species population • 2. Validate additional constraints • Select dominant relations • Formulate interviews using highligted text • 1. Validate Implied Ontological Constraints: • Generalize semantic relations • Interpret relation given ontology parent • Formulate interview using highlighted text DE-ON DE-WN people A.. ... decline ... population ... ..Z Do populations consist of marine species? Can populationsdecline? • Sumo axiom for Group (Hidden Data) • (=>    (and        (instance ?GROUP Group)        (member ?MEMB ?GROUP))    (instance ?MEMB Agent)) Do populations always decline? Do populations always consist of marine species? Are populations located in forests? Are populations always located in forests? Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  28. Derived hidden structures • New constraint Population in DE-ON: (subclass Population Group) (=>    (and        (instance ?POP Population)        (member ?MEMB ? POP) (instance ?MEMB Species))) • Extended constraint Population in DE-ON: (subclass Population Group) (=>    (and        (instance ?POP Population)        (member ?MEMB ? POP) (instance ?MEMB Species) (*instance ?REGION Region)  * indicates possible relations (*inhabits ?MEMB ?REGION)  * indicates possible relations (*location ?MEMB ?REGION)))  * indicates possible relations Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  29. Cross-lingual validation • Population is added by Group-1, with constraints derived from language L1 • Group-2 uses languages L2 and observes a domain Type in the domain ontology with an English gloss, description -> possibly proposed through WSD • Select/confirm existing domain type as a candidate for validation • Smart Ky-Text in Language L2 and the Term hierarchy are used to generate questions in L2 • Group-2 can confirm or deny constraints for L2 and add new constraints • Cross-lingual and cross-group validation is added to the constraints in the ontology Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  30. Cross validated structures • Population in DE-ON: (subclass Population Group) (=> (and    (instance ?POP Population)    (member ?MEMB ? POP) (instance ?MEMB Species (xval G1-ENG G2-NLD G3-NLD G4-ITA)) (instance ?REGION Region(xval G1-ENG G2-NLD)) (*inhabits ?MEMB ?REGION (xval G3-NLD)) (*location ?MEMB ?REGION (xval G1-ENG G4-ITA))))) Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  31. Capture Server Document Base Linear KAF Concept User Tybot server (TermExtraction) Semantic Annotation Document Base Linear KAF ExtractedTerms Generic K-TMF KybotEditor Fact User Kybot Profiles Kybot Server (FactExtraction) TermEditor (Wikyoto) Document Base Linear Generic KAF Domain Wordnet K-LMF Domain Ontology OWL_DL Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  32. What Kybots do • Input: • KAF annotations of text: sequential & encoded by language • Conceptual frame from the ontology • Expression rules for frame to language mapping: • Wordnet in a language • Morpho-syntactic mappings rules • Output are a database of facts in KAF/FactAF (generic): • aggregated facts • inferred facts • language neutral Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  33. Fact mining • KYBOT = Knowledge Yielding Robot • Logical expression • (instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2) • (instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1) • Expression rules per language: • [N[s1]V[e1]]S • [N[e1]N[s1]N • [[N[e1]][prep][N[s2]]NP • Ontology * Wordnets • Capabilities • Conditions: WNT -> adjectives, WNT -> nouns • Causes: WNT -> verbs, WNT -> nouns • Process: DamageProcess, ProduceProcess • Kybot compiler • kybots = logical pattern+ ontology + WN[Lx] + ER[Lx] Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  34. Source Documents Morpho-syntactic analysis (KAF) [[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP Logical Expressions Generic Fact analysis [[the emission]NP ] Process: e1 [of greenhouse gases]PP Patient: s2 [in agricultural areas]PP] Location: a3 Domain Fact mining by Kybots Linguistic Processors Ontology Wordnets & Linguistic Expressions  Abstract Physical Patient Substance Process Chemical Reaction H2O CO2 Patient CO2 emission water pollution Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

  35. Ontology Lexical database: wordnet Abstract Physical substance:1 natural process:1 Process Substance Ontologize C02 emission:2 emission:3 gas:1 ChemicalReaction H20 Greenhouse Gas CO2 CO2 Emission Global Warming greenhouse gas:1 Synthesize Text mining by Kybots Term database • gas • green house gas -> gas • increase(AG) • in 2003 (TIME) • CO2 -> green house gas • emission (PA) • -in European countries (LO) Text corpus Sudden increase of green house gases in 2003........ C02 emission in European countries....Green house gases such as C02, .... Concept Mining by Tybots Axiomatize Maximal abstraction& integrity Language neutral integrity (instance s1 Substance) (instance e1 Warming) (katalyist s1 e1) Generic text based Linear text

  36. Thank you for your attention Tienjarig jubileum NL-Term, 25 October 2008, Amsterdam

More Related