440 likes | 563 Views
KYOTO ( ICT - 211423) Y ielding O ntologies for T ransition-Based O rganization FP7: Intelligent Content and Semantics http://www.kyoto-project.eu/ Chu-Ren Huang 黃居仁 , Academia Sinica. 歐盟科研架構計畫之人文及社會科學領域 (EU-FP7 SSH) 計畫徵求說明會 2009.12.30 國立中山大學 - 歐盟科研架構計畫之人文及社會科學國家聯絡據點. Overview.
E N D
KYOTO (ICT-211423)Yielding Ontologies for Transition-Based OrganizationFP7: Intelligent Content and Semantics http://www.kyoto-project.eu/ Chu-Ren Huang 黃居仁,Academia Sinica 歐盟科研架構計畫之人文及社會科學領域(EU-FP7 SSH)計畫徵求說明會 2009.12.30 國立中山大學-歐盟科研架構計畫之人文及社會科學國家聯絡據點 December 30 2009, NSYSU, Kaohsiung
Overview • History: Background information • What is KYOTO • Personal Journey: Building an internationally recognized career on Taiwan-based research • Key Perspectives: • Global View • Integrative Thinking/Opposable Mind December 30 2009, NSYSU, Kaohsiung
History: Background information December 30 2009, NSYSU, Kaohsiung
Pre-History • Pioneering Chinese Language Resources and Language Processing: since 1988 • Construction of WordNet – Since 2000 • Organized COLING: 2002 • ISLE: International Standards in Language Engineering 2000-2002 • EC (ISLE – IST-1999-10647)+NSF+Asia December 30 2009, NSYSU, Kaohsiung
Brief History of KYOTO • January 2006: Concept of Global WordNet Grid • 2006 discussion of possibilities • January 2007: Meeting in Kyoto (Amsterdam, Princeton/Berlin, Pisa, Kyoto, Taipei) • Identify the FP7 call to submit to • -Identify ecology/environment as the domain December 30 2009, NSYSU, Kaohsiung
Application Timeline I (2007) • feb-15: General comments • feb-15: Contact end-users • feb-22: Find out the possibilities for non-European partners • feb-22: Determine the final consortium a.o. based on the outcome of 2. • feb-28: Determine the details (part A of the proposal) required from the EU for each partner December 30 2009, NSYSU, Kaohsiung
Application Timeline III (2007) • mar-apr: Revision and finalizing proposal • May 10: formal Submission Acknowledged • July 15: Review Result • 8 out or 45 project passed review • Call ID:* FP7-ICT-2007-1*Instrument:* CP-FP-INFSO*Title:* Knowledge Yielding Ontologies for Transition-based Organization December 30 2009, NSYSU, Kaohsiung
Application Timeline III (2007) • apr-13: Collect all forms (part A of the proposal) and signatures from the partners (PISA, AMSTERDAM) • apr-13: Finalize the proposal part B (PISA, AMSTERDAM) • may-02:Submit proposal part A and B (PISA, AMSTERDAM) December 30 2009, NSYSU, Kaohsiung
What is KYOTO December 30 2009, NSYSU, Kaohsiung
KYOTO (ICT-211423) Overview • Title: Knowledge Yielding Ontologies for Transition-Based Organization • Funded: • 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics • Taiwan and Japan funded by national grants • Goal: • Open and free platform for knowledge sharing across languages and cultures • Wiki environment that allows people in the field to maintain their knowledge and agree on meaning without knowledge engineering skills • Bootstrap through open text mining & concept learning • Enables knowledge transition and information search across different target groups, transgressing linguistic, cultural and geographic boundaries. • Enables deep semantic search for facts and knowledge • URL: http://www.kyoto-project.eu/ (http://www.kyoto-project.eu/) • Duration: • March 2008 – March 2011 • Effort: • 364 person months of work. December 30 2009, NSYSU, Kaohsiung
Consortium • Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), • Consiglio Nazionale delle Ricerche (Pisa, Italy), • Berlin-Brandenburg Academy of Sciences and Humantities (Berlin, Germany), • Euskal Herriko Unibertsitatea (San Sebastian, Spain), • Academia Sinica (Tapei, Taiwan), • National Institute of Information and Communications Technology (Kyoto, Japan), • Irion Technologies (Delft, The Netherlands), • Synthema (Rome, Italy), • European Centre for Nature Conservation (Tilburg, The Netherlands), • Subcontractors: • World Wide Fund for Nature (Zeist, The Netherlands), • Masaryk University (Brno, Czech) December 30 2009, NSYSU, Kaohsiung
KYOTO (ICT-211423) Overview • Languages: • English, Dutch, Italian, Spanish, Basque, Chinese, Japanese • Domain: • Environmental domain, BUT usable in any domain • Global: • Both European and non-European languages • Available: • Free: as open source system and data (GPL) • Future perspective: • Content standardization that supports world wide communication December 30 2009, NSYSU, Kaohsiung
The Taiwan Team • PI: Chu-Ren Huang • Co-I: Jason S. Chang (NTHU), Shu-Kai Hsieh (NTNU), Sue-jin Ker (SCU) • Other Participants: Kathleen Ahrens (NTU), Ya-min Chou (MCU), Shu-chuan Tseng (AS) • Funded: by NSC December 30 2009, NSYSU, Kaohsiung
Background: Multilingualism’s Challenges to HLT The scaling up of language resources in a complex and distributed environment • Language resources are inherently distributed • Language resources are best created and updated where the language is spoken and by people who speak it: human expertise, updating ling. changes, • Impractical to maintain all language resources at the same site: huge quantity, rights December 30 2009, NSYSU, Kaohsiung
Multilingualism: Challenges to HLT II The scaling up of language resources in a complex and distributed environment • To overcome linguistic diversity to support shared tasks and applications: web search etc. • To create synergy of information from different languages • To function as a foundation of inter-cultural collaboration December 30 2009, NSYSU, Kaohsiung
Proposed Answer to the Challenge Wordnet as shared language resource • Wordnet: a concept-driven and relation-based lexical knowledgebase • About 40 language wordnets have been built • Sharing basic representation of meaning (synset indexes), which is mapped to an upper ontology (SUMO, among others) • Sharing a (universal) set of lexical semantic relations Information can be exchange using the same format regardless of source language December 30 2009, NSYSU, Kaohsiung
Proposed Answer to the Challenge Wordnets as Web Services • Wordnet are distributed, just like grid nodes • Each wordnet site will be a grid node • Each will be a natural hosts for language related information service based on wordnet • Including any meta-NLP task: bootstrapping wordnets, harmonizing ontologies, building bilingual lexica, supporting cross-lingual alignments, etc. • And applications: multilingual query expansion, second language e-learning, machine translation, etc. December 30 2009, NSYSU, Kaohsiung
The Global Wordnet Grid • First discussed at the 3rd GWA at Jeju, Korea in February 2006, by Chu-Ren Huang, Adam Pease, and Pied Vossen, among others • A call for contribution can be found on GWA website http://www.globalwordnet.org/gwa/gwa_grid.htm • Small scale experiment being carried out by ILC-CNR (Italy) and Academia Sinica (Taiwan) teams • Soria et al. (2006) • Planned strategic session in January 2007 in Kyoto December 30 2009, NSYSU, Kaohsiung
Baseline retrieval results 6 persons, 30 high-level questions, December 30 2009, NSYSU, Kaohsiung
KYOTO's Solution • Text mining: • Massive and accurate indexing of facts from vast amounts of text; • In any language/culture from scattered sources; • Again and again to detect trends and changes; • Direct relation between knowledge modeling effort and text mining • Knowledge modeling: • automatic learning of terms and concepts from text in any language; • formalization of knowledge in computer usable format -> wordnets & ontologies • Community software: • For experts in the field and not knowledge engineers • Continuous and collaborative effort: • adapt to the changing domain; • consensus in the field; • consensus across languages and cultures • Produce interoperable, formal, standardized knowledge structures; • Relate knowledge structure to expressions in languages December 30 2009, NSYSU, Kaohsiung
Distributed, diverse & dynamic data 1 Citizens 4 Governments maintain terms & concepts Companies Wikyoto Capture text: "Sudden increase of CO2 emissions in 2008 in Europe" Wordnets Ontology 2 Top Abstract Physical Tybot: term yielding robot Process Substance 3 CO2 emission Middle H20 CO2 H20 Pollution CO2 Emission Greenhouse Gas Domain Kybot: knowledge yielding robot Index facts: Process: Emission Involves: CO2 Property: increase, sudden When: 2008 Where: Europe 5 6 Text & Fact Index Semantic Search Environmental organizations December 30 2009, NSYSU, Kaohsiung
Available data repositories • Open data project: • DBPedia: 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples). • GeoNames • Domain database Species 2000: 2,1 million species • Term database: 500,000 terms per 10,000 documents per language • Wordnets for 7 languages: about 50,000 to 120,000 synsets per language • Ontologies: EuroWordNet top ontology, SUMO, DOLCE December 30 2009, NSYSU, Kaohsiung
How to integrate the data? • Species 2000 vocabulary: 2,171,281 concepts in MySql database with parent relations: • Kingdom -> Class -> Order -> Family -> Genus -> Species -> Infra species • Animalia -> Chordata -> Amphibia -> Anura -> Leptodactylidae -> Eleutherodactylus -> Eleutherodactylus augusti • Converted to SKOS format • Aligned with DBPedia for language labels • Aligned with Wordnet using vocabulary and relation mappings • Published in Virtuoso, accessed with SPARQL queries December 30 2009, NSYSU, Kaohsiung
How to integrate data?Extending language labels using DBPedia December 30 2009, NSYSU, Kaohsiung
Kyoto Knowledge Base 500K T Terms Domain Domain T T Domain Wn 2,100K V Wn Vocabularies Wn 500K Terms Ontology Base concepts Domain Domain Wn DBPedia Wn 2,100K Domain Domain Vocabularies T V Wn Wn V T Domain Domain DOLCE/OntoWordnet December 30 2009, NSYSU, Kaohsiung T V
Should all knowledge be stored in the central ontology? • Vocabularies are too large for full inferencing • Vocabularies are linguistically too diverse to be represented in an ontology • Inferencing capabilities of formal ontologies is not needed for all levels of knowledge • A model of division of labor (along the lines of Putnam 1975) in which knowledge is stored in 3 layers: • SKOS vocabularies and term databases • wordnet (WN-LMF) • ontology (OWL-DL), • Each layer supports different types of inferencing ranging from Sparql queries, graph algorithms to reasoning. • Mapping relations that support the division of labour and different types of inferencing and that allow for the encoding of language-specific lexicalizations and restrictions. December 30 2009, NSYSU, Kaohsiung
What does the computer need to know? • Distinction between rigid and non-rigid (Welty & Guarino 2002): • being a "cat" is essential to individual's existence and therefore rigid • being a "pet" is a temporarily role and therefore non-rigid; a cat can become a pet and stop being a pet without ceasing to exist • Felix is born as a cat and will always be a cat, but during some period Felix can become a pet and stop being a pet while it continuous to exist • All 2.1 million species are rigid concepts December 30 2009, NSYSU, Kaohsiung
What does the computer need to know? • Roles and processes in documents have more information value than the defining properties of species: • Species defined in terms of physical properties already known to expert; • Roles such as "invasive species", "migration species", "threatened species" express THE important properties of instances of species • Telicity: Roles are typically the terms we learn from the text not the species! December 30 2009, NSYSU, Kaohsiung
Division of labor in knowledge sources Skos database Wordnet Ontology 2.1 million species 100,000 synsets 1,000 types animal:1 Base Concept endurant Animalia Chordata physical-endurant chordate:1 Amphibia physical-object vertebrate:1,craniate:1 Anura amphibian:3 Leptodactylidae Term database frog:1, toad:1, toad frog:1, anuran:1, batrachian:1, salientian:1 Eleutherodactylus 500,000 terms endemic frog endangered frog poisonous frog alien frog Eleutherodactylus atrabracus Eleutherodactylus augusti barking frog December 30 2009, NSYSU, Kaohsiung
Wordnet-ontology-relations • Rigid synsets: • Synset:Endurant; Synset:Perdurant; Synset:Quality: • sc_equivalenceOf or sc_subclassOf • Non-rigid synsets: • Synset: Role • sc_domainOf: range of ontology types that restricts a role • sc_playRole: role that is being played December 30 2009, NSYSU, Kaohsiung
Lexicalization of process-related concepts {create, produce, make}Verb, English -> sc_ equivalenceOf ConstructionProcess {artifact, artefact}Noun, English -> sc_domainOf PhysicalObject -> sc_playRole ConstructedRole {kunststof}Noun, Dutch // lit. artifact substance -> sc_domainOf AmountOfMatter -> sc_playRole ConstructedRole {meat}Noun, English -> sc_domainOf Cow, Sheep, Pig -> sc_playRole EatenRole {名 肉, 食物, 餐}Noun, Chinese -> sc_domainOf Cow, Sheep, Pig, Rat, Mole -> sc_playRole EatenRole {غذاء, لحم, طعام}Noun, Arabic -> sc_domainOf Cow, Sheep -> sc_playRole EatenRole December 30 2009, NSYSU, Kaohsiung
How to make inferences? • Sparql queries to large Virtuoso databases: Aligned Species 2000, DBPedia • Sql queries to term database • Graph matching on wordnets • Reasoning on a small ontology December 30 2009, NSYSU, Kaohsiung
Semantic Search (skipped)
The core Kyoto system is distributed under the free open source license (GPL)
Personal Journey: Building an internationally recognized career on Taiwan-based research December 30 2009, NSYSU, Kaohsiung
Pre-History • Pioneering Chinese Language Resources and Language Processing: since 1988 • Construction of WordNet – Since 2000 • Organized COLING: 2002 • ISLE: International Standards in Language Engineering 2000-2002 • EC (ISLE – IST-1999-10647)+NSF+Asia December 30 2009, NSYSU, Kaohsiung
Key Perspectives:Global ViewIntegrative Thinking/Opposable Mind December 30 2009, NSYSU, Kaohsiung
Global View • Think and Act Globally • Put what is good for the world before what is good for Taiwan • What is good for the world must be good for Taiwan, but what is good for Taiwan (thinking parochially) may not be good for the world • Hence cannot be supported by other partners • CANNOT be done NOT GOOD for Taiwan December 30 2009, NSYSU, Kaohsiung
Think Globally • Research Direction: Think of Global Impact • Not of local ranking • Find your own niche 寧為雞首,不為牛後 • Think of the scale of Taiwan • And act strategically • Contributing Team Partner vs. Team Leader : Choose the RIGHT team, NOT my team December 30 2009, NSYSU, Kaohsiung
Integrative Thinking/Opposable Mind • Create a Win-Win Situation out of a Zero-Sum Game • The Opposable Mind (Roger Martin 2007) • The Design of Business: Why Design Thinking is the Next Competitive Advantage (Martin 2009) December 30 2009, NSYSU, Kaohsiung
In Sum: 友 直 諒 多聞 December 30 2009, NSYSU, Kaohsiung