1 / 28

A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

A Domain Ontology Engineering Tool with General Ontologies and Text Corpus. Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi. DODDLE and DODDLE II. Domain Ontology rapiD DeveLopmet Environment Builds taxonomic and non-taxonomic relationships

pearly
Download Presentation

A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Domain Ontology Engineering Tool with General Ontologies and Text Corpus Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi

  2. DODDLE and DODDLE II • Domain Ontology rapiD DeveLopmet Environment • Builds taxonomic and non-taxonomic relationships • Uses dictionary approach and text corpus (body) to build relationships

  3. DODDLE & DODDLE II • Large Ontologies are difficult to build by hand • Locates relationships between words based on context similarities; even if separated • Disadvantages • Human Interaction is still required • Low amount of success

  4. DODDLE vs DODDLE II • DODDLE only works on taxonomic relationships • DODDLE II • Extension of DODDLE • Finds non-taxonomic relationships

  5. Outline • Overview • Taxonomic Relationships • Non-Taxonomic Relationships • Case Studies • Problems/Future Work • Conclusion • Assessment

  6. Overview Domain Terms Domain Specific Text Corpus Concept Extraction Module TRA Module NTRL Module

  7. Overview TRA Module Matched Result Analysis MRD (Wordnet) Trimmed Result Analysis Modification using syntactic strategies Taxonomic Relationship

  8. Overview NTRL Module Extraction of frequent words WordSpace creation Domain Specific Text Corpus Extraction of similar concept pairs Concept specification templates Non-Taxonomic Relationship

  9. Overview Overview Taxonomic Relationship Non-Taxonomic Relationship Interaction Module

  10. TRA Module Matched Result Analysis MRD (Wordnet) Trimmed Result Analysis Modification using syntactic strategies Taxonomic Relationship

  11. TRA • Matched Result Analysis • Constructs PAB and STM • Trimmed Result Analysis • Remove unnecessary nodes • Modification using statistical strategies • Allows for human input

  12. PAB and STM

  13. TRA

  14. NTRL Module Extraction of frequent words WordSpace creation Domain Specific Text Corpus Extraction of similar concept pairs Concept specification templates Non-Taxonomic Relationship

  15. NTRL • Extraction of key words • Primitive: 4 words • Collocation matrix • ai,j = fi before f j …f8 f4 f3 f7 f8 f4 f1 f3 f4 f9 f2 f5 f1 f7 f1 f5 …

  16. NTRL • WordSpace Creation • Context Vectors • Word Vectors • Sum of Context Vectors • г(w)=∑ ( ∑ φ(f)) iε C(w) f close to i a 4-gram vector of a 4 gram f A vector representation of a word of phrase w Appearance places of a word or phrase w WordSpace is a collocation of г(w)

  17. NTRL • Extraction of Concept Pairs • Each input has a best-matched “synset” • Synset: collection of word vectors • Sum of the word vectors set to a concept which corresponds with each input term • Inner product of all combinations of concept pairs • Match is determined by user set threshold • Case Study: .87

  18. NTRL • Finding Association Rules • Locates Rules of the form:

  19. NTRL • Constructing Concept Specification Templates • Set of Similar concept pairs and association rules • DODDLE sets priorities between concept pairs • Based on TRA Module and Co-occurrence information

  20. Case Study • Law-“Contract for International Sale of Goods” • Business -“XML Common Business Library” Support: 0.4 % Confidence: 80%

  21. Law Case Study • Given 46 Concepts • WordSpace: 77 concept pairs • Association between input terms: 55 pairs or terms • Templates

  22. Business Case Study • Input: 57 terms • Wordspace: 40 pairs • Association between input terms: 39

  23. Taxonomic Results

  24. Non-taxonomic Results

  25. Problems/ Future Work • Threshold • Changes with each domain • Specification of a Concept Relation • Still need to specify relationships • Ambiguity of Multiple Terminology • “transmission” • Semantic specialization of multi-definition words needed. • DODDLE-R • Uses RDF tags

  26. Conclusion • Uses MRD and text corpus • Two strategies for taxonomic: matched result analysis and trimmed result analysis • Non-Taxonomic: extracted by co-occurrence information in text corpus • Concept Specification: a way to eliminate concept pairs to build an ontology

  27. Assessment • Designed to be a tool • No time results • Determining thresholds is plug-and-guess.

  28. Questions ?

More Related