1 / 60

Text-based Construction and Comparison of Domain Ontology: A study based on classical poetry

Text-based Construction and Comparison of Domain Ontology: A study based on classical poetry. Chu-Ren Huang Academia Sinica. Outline. Motivation and Framework: Laying the foundation Basic Resources: The building blocks From General Ontology to Specific Ontology: Study of Shu-Shi Poems

thisbe
Download Presentation

Text-based Construction and Comparison of Domain Ontology: A study based on classical poetry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text-based Construction and Comparison of Domain Ontology: A study based on classical poetry Chu-Ren Huang Academia Sinica Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  2. Outline • Motivation and Framework: Laying the foundation • Basic Resources: The building blocks • From General Ontology to Specific Ontology: Study of Shu-Shi Poems • Epilogue: From Specific Ontology to General Ontology • Conclusion Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  3. Motivation and Framework: Laying the foundation Knowledge Structure Discovery Issues and Significance Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  4. Knowledge and Knowledge Structure Variation Knowledge is Structured Information • Most salient factors dictating variations in knowledge structures are time, space, and domain • Language is both the product and conduit of the conceptual structure of its speakers Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  5. Knowledge and Structure Mismatch: a historical example 盧家少婦鬱金香,海燕雙棲玳瑁梁。 (from Tang 300) -Tulips (鬱金香)in Tang ? -No, the text refer to the fragrance of a ginger like herb -鬱金 ‘Young lady Lu, as fresh and fragrant as ginger grass, Looks on the pair of seagulls resting on the beam inlaid with sea turtle shells.’ Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  6. Accessing Knowledge Structure • In order to become sharable and reusable knowledge, all extracted information must first be correctly situated in a knowledge structure • The situated information must be allowed to transfer from knowledge structure to knowledge structure without losing its meaningful content Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  7. Research Goal • Knowledge Structure Discovery • Knowledge as situated information • Language endows information with structure • Text-based and Lexicon-driven Knowledge Structure Discovery • General Ontology: the upper ontology shared by all domains (such as SUMO) • Specific Ontology: a ontology specific to a domain, historical period, an author etc. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  8. Research Issues • Identification of Conceptual Atoms • Re-construction and Verification of Conceptual Structure • Knowledge Processing with Mismatched Knowledge Structures Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  9. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  10. Knowledge Inferred form the Ontology of Tang Animals • No marsupials: only found in Australia, and only found much later • No marine mammals: Tang civilization activities mainly stays on land, as well as the dominance of hoofed animals (fascination with horses?) • Large number of birds among mammals, and the dominance of insects 昆蟲among invertebrates 無脊椎動物 Tang civilization’s fascination with flying [Birds fly. And insects are the invertebrates that have wings.] Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  11. Research Methodology • The Mental Lexicon Approach • The Shakespearean-garden Approach • The Ontology-merging as Ontology-discovery Approach Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  12. The Mental Lexicon Approach • Concepts are stored in the mental lexicon • The basic unit of mental lexicon organization and access is lexical entry • A complete list of lexical entries covers the complete list of conceptual atoms • Lexical semantic relations mirror conceptual relations Each Word is a Conceptual Atom Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  13. The Shakespearean-garden Approach • A Shakespearean garden collects all the plants referred to in Shakespearean texts. • The garden is used to illustrate the flora of the Shakespearean England and gives scholars a context in which to interpret his work. • There is a knowledge structure behind each corpus (i.e. a collection of texts with design criteria)  Lexicon as a Structured Inventory of Conceptual Atoms For instance, complete set of texts by an author, from a certain period, or in a certain domain Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  14. The Ontology-merging as Ontology-discovery Approach I • Ontology provides a structure for knowledge to be situated • However, there is a dilemma for the construction of a new ontology • If no existing ontology is referred to: reinventing the wheel, difficult to start a structure from scratch without rules • If existing ontology is referred to: mislead by existing structure, mismatched or erroneous Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  15. The Ontology-merging as Ontology-discovery Approach II The Solution • Map conceptual atoms to two (or more) reference ontologies • Merge the two resultant ontologies • Matched Mapping: Confirmation of knowledge structure • Mismatched Mapping: Only one or neither is correct. Possibly lead to discovery of new knowledge structure • Complimentary Mapping: Increases coverage Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  16. Further Developments • The Ontology of Chinese Characters: A common knowledge structure for East Asian Cultures • Contrary to earlier study of constructing specific ontologies based on general ontology, the Chinese character ontology will be a crucial general ontology based on a specific ontology Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  17. Basic Resources: The building blocks From Text to Lexicon From Lexicon to Ontology Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  18. Resources used • WordNet • SUMO Ontology • Academia Sinica Bilingual Ontological Wordnet (Sinica BOW) • Domain Lexicon Management System: Segmentation, New Word Detection Lexical Database Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  19. Resources • Sinica BOW: SUMO+WordNet http://bow.sinica.edu.tw http://www.ontologyportal.org or http://ontology.teknowledge.com http://www.cogsci.princeton.edu/~wn/ • Segmentation Program etc. http://LingAnchor.sinica.edu.tw/ Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  20. SUMO: Suggested Upper Merged Ontology SUMO Atoms • Concepts: around 1000 Note that concepts are not necessarily linguistically realized • Relations(ISA): See SUMO Graph • Axioms: for inference • Open resource created under an initiative from IEEE Standard Upper Ontology Working Group Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  21. Methodology • From lexicon to ontology (from items to structure) • Ontology discovery through ontology merging Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  22. WHY? • We do not have the knowledge structure (ontology) of a new domain (historical period, field etc.) • But typical ontology discovery needs a framework to be mapped to • To solve the dilemma we map the conceptual atoms to both SUMO and WN (as a linguistic ontology) Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  23. From General Ontology to Specific Ontology: Study of Shu-Shi Poems A Research Collaborated with Feng-ju Luo, Sue-ming Chang, and Ru-Yng Chang Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  24. Opus Shu Shi蘇軾 • Who is Su Shi (A.D.1036-1101)? • One of the most prominent scholars in Song dynasty who is very knowledgeable and well-traveled. • 45 volumes (out of 50) of his work has already been digitized and segmented (by Feng-ju Luo) Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  25. How to build a domain ontology Word segmentation WordNet Match WordNet synsetand SUMO conceptautomatically SUMO Use WordNet information to check results and extend concept Transform into ontology browser format Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  26. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  27. Distribution of Su Shi lexicon • 98,430 words in NO.1-45 volume Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  28. The distribution of animal, plant, and artifact concepts in Su Shi’s poems Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  29. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  30. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  31. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  32. Comparing Two Ontologies: 300 Tang Poems and Collection of Su Shi’s Poems • One conceptual node missing in both ontologies: • 有袋類(marsupial) • Concepts found in Su Shi’s but not in Tang 300 • palm棕櫚科植物 (plant -> woody plant ->tree-> palm ) 椰葉(coconut palm)、檳榔* (betel palm) • 無枝林>食檳榔>月照無枝林, • 椰葉>追餞正輔表兄至博羅,賦詩為別>置酒椰葉桄榔間。 Guangdong and Hainan Island Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  33. Comparing Two Ontologies: 300 Tang Poems and Collection of Su Shi’s Poems • Words stand for multiple concepts in the same source. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  34. bird cuculiform_bird Cuckoo ani roadrunner coucal Centropus_sinensis pheasant_coucal shrub bush rhododendron azalea Example of WordNet lexical relation 杜鵑 DuJuan Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  35. SUMO WordNet bird organism cuculiform_bird plant animal Cuckoo invertebrate vertebrate ani roadrunner coucal Flowering plant Centropus_sinensis pheasant_coucal warm blooded vertebrate shrub bush mammal bird rhododendron azalea SUMO + WordNet 杜鵑 DuJuan Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  36. What We Learned about Specific Ontology Constructing ontology from a larger corpus and comparison of two specific ontologies • Local information can be effectively mapped • Global information offers deeper insights into the knowledge structure ☆Human conceptualization of animals and plants has been relatively stable. But NOT artifacts. ☆Regardless of the criteria for classification, genetically determined features (behaviors, appearances etc.) do not vary greatly ☆However, human technology is highly fluid. Our conceptualization of artifacts is highly dependent on the development of engineering and by our varying societal needs. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  37. Towards a Workbench for Specific Ontology: Browser and Editor User login Function menu (Personal ontologies list) Browse an ontology Edit an ontology Add an ontology Logout • SUMO • SUMO • + WordNet • +concept map with lexicon • Update lexical concepts • Update mapping between WordNet synset and lexicon • Edit other information in lexicon Import text Import lexicon Word segmentation Match concept and synset automatically • Suggestion list • Missing list Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  38. Constructing a Specific Ontology • Import text, or domain lexicon • Select style of writing • Select category of word list for word segmentation • Select reference ontologies to match SUMO and lexicon • Information of suggestion list • Candidate synset • Candidate synset synonyms • Explanation of candidate synset • Concept of candidate synset Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  39. Example of SUMO concept Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  40. http://bow.sinica.edu.tw/ont/SuShi_ont.html Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  41. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  42. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  43. Summary and Future Work • Ontologies represent the knowledge structure of a domain or historical period • We have provided an online interface to browse ontologies and lexica • In the future, we will complete the online ontology editor and browser, which will • Map lexicon, WordNet and SUMO. • Integrate ontologies based on different texts. • Facilitate comparative studies of various domain ontologies. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  44. From Specific Ontology to General Ontology漢字知識本體An introduction to Hanzi ontology Research in Collaboration with and Conducted by Ya-Ming Zhou周亞民 Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  45. Outline • Introduction • The logographic features of Hanzi • Semantic symbols of Hanzi • The structure of lexicon relation • The structure of Hanzi ontology • Summery Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  46. Introduction (1/2) • Ideograph: Each Chinese character (kanji) is a writing unit which also represents a pre-defined concept. The represented concept is independent of phonological variations, including language changes and cross-lingual adaptation • The complete Han writing system is expected to consists of 40,000-70,000 characters each representing one or more concepts. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  47. Logographic Features of Hanzi • 馬 is a semantic symbol of horse • Examples: • 驩:馬名 a kind of horse • 驫:眾馬 horses • 騎:騎馬 riding a horse • 驍:良馬 a good horse • 驚:馬驚 a scared horse 馬 Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  48. Semantic Symbols in Hanzi(1/3) • The characteristics of Hanzi mainly come from semantic symbols. • According to Xyu Shen’s ShoWenJieZi (100 A.D.) , there are 540 semantic classes (radicals) • These radicals represent the knowledge structure of Hanzi. Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  49. Semantic Symbols in Hanzi • 540 radicals are used to classify all Chinese characters and represented • The semantic symbols about animals: • 鳥(bird),隹(bird),犬(dog),馬(horse),羊(sheep),虫(insect)… • The semantic symbols about plant: • 艸,木,竹,禾… • The semantic symbols about religion: • 示 Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

  50. Plants Description Usage Name Parts 蕉蘭芒蒙菌蔓苦菊茱范荷茅蕈蔚菲草 茲蒼芳落茸茂荒薄芬蒸莊 蕃藥蔬菜薪苑藩藉茭 The Classification of Hanzi with 艸(艹) Description Usage Parts 萌莖芽茄苗蓮葉 Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

More Related