1 / 28

Doozer

Doozer. An Expand and Reduce approach to Automatic domain model creation Christopher Thomas, Pankaj Mehra , Roger Brooks, Amit Sheth. Knowledge Enabled Information and Services Science. Motivation. T h e B l a c k S w a n. What will you want to know tomorrow?  . Wisdom of the Crowds.

evers
Download Presentation

Doozer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Doozer An Expand and Reduce approach to Automatic domain model creation Christopher Thomas, PankajMehra, Roger Brooks, AmitSheth Knowledge Enabled Information and Services Science

  2. Motivation TheBlackSwan What will you want to know tomorrow? 

  3. Wisdom of the Crowds • The most comprehensive and up to date account of the present state of knowledge is given by Everybody • The Web in general • Blogs • Wikipedia

  4. Collecting Knowledge • Wikipedia • Concise concept descriptions • Large, highly connected,, sparsely annotated graph structure that connects named entities • Category hierarchy

  5. Goal: Harness the Wisdom of the Crowds to Automatically define a domain with up-to-date concepts

  6. Don’t overshoot • Generating a broad topic hierarchy on the basis of Wikipedia is easy • The Problem is to restrict it to the actual domain of interest

  7. Simplifying assumptions • We do not create a formal domain ontology • Topic hierarchy will be used for classification and IR, not for reasoning Ontology Edition

  8. … Hence • We can safely take advantage of existing (semi)structured knowledge sources Ontohazard

  9. Requirements for the classifier Find pertinent terms

  10. Requirements for the classifier • Classify the terms in a topic hierarchy

  11. Key Concepts • Focus • What is the narrow domain of interest? • E.g. FTP Servers • Domain • What is the broader domain of interest? • E.g. Networking, File Transfer • World View • Where is our world view centered? • E.g. information sciences • Different World View  Different connections

  12. Overview - conceptual • Expand and Reduce approach • Start with ‘high recall’ methods • Exploration - Full text search • Exploitation – Node Similarity Method • Category growth • End with “high precision” methods • Apply restrictions on the concepts found • Remove unwanted terms and categories

  13. Expand - conceptually Full text search on Article texts Delete results with low Lucene score Graph-based expansion

  14. Links types • “See Also” link - Inverse “See Also” link - Double link - Link and common category - Inverse link and common category - Regular link - Inverse link

  15. Node Similarity computation

  16. Collecting Instances

  17. Creating a Hierarchy

  18. Creating a Hierarchy

  19. Hierarchy Creation - summary

  20. Reduce steps • Remove all terms that have low pertinence to the domain • Intersect hierarchy with broader focus domain • Reduce hierarchy depth

  21. Remove unwanted individuals

  22. Remove unwanted categories

  23. Flatten categories

  24. Snapshot of final Topic Hierarchy

  25. Evaluation wrt. Search and Set Expansion

  26. Evaluation wrt. other term expansion tools F-measures, computed against a reduced glossary, for the lists of terms generated by various mining tools

  27. Evaluation wrt. expert model Evaluation wrt. MeSH versions of 2004 (04) and 2008 (08) for both the restricted and the full set of MeSH term

  28. Thank you

More Related