1 / 30

Learning the Semantic Meaning of a Concept from the Web

This study explores automating the training data collection process for text classification by leveraging the web. The approach aims to reduce manual effort, using ontology mapping, exemplars, and a prototype system. Experimental results are discussed, focusing on semantic concepts such as living things and weapons.

rnation
Download Presentation

Learning the Semantic Meaning of a Concept from the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning the Semantic Meaning of a Concept from the Web Yang Yu and Yun Peng May 30, 2007 yangyu1@umbc.edu, ypeng@umbc.edu

  2. LIVING_THINGS ANIMAL PLANT HUMAN CAT TREE GRASS MAN WOMAN ARBOR FRUTEX The Problem • Manually preparing training data for each concept in text classification based ontology mapping is expensive. Exemplars

  3. http://www.google.com/ Our Approach • Automatically collecting training data. • Benefits • Reduce the amount of human work

  4. Overview • Background • The semantic Web and ontology • Ontology Mapping • Approach • Prototype System • Experimental Results • WEAPONS ontology • LIVING_THINGS ontology • Limitations and Conclusions

  5. Semantic Web and Ontology Mapping • The Semantic Web • “an extension of the current web” • ontology files and programs that use them • Ontology Mapping • Interoperability problem • Mapping • r = f (Ci, Cj) where i=1, …, n and j=1, …, m; • r {equivalent, subClassOf, superClassOf, complement, overlapped, other}

  6. Approaches to Ontology Mapping • Manual mapping • String Matching • Text classification • the semantic meaning of a concept can be reflected in the training data (exemplars) that use the concept • Probabilistic feature model • Classification • Results highly dependent on the quality of exemplars

  7. Motivation and Proposal • Preparing exemplars manually is costly • Billions of documents available on the web • Search engines

  8. The Proposal • Using the concept defined in an ontology and the semantic information to form a query and processing the search results to obtain exemplars • Verification • Build a prototype system • Check ontology mapping results

  9. Ontology A Parser Queries Retriever Retriever WWW Links to Web Pages Processor HTML Docs Text Files System overview – Part I Search Engine 1. Whole file 2. Only sentences containing search keywords

  10. Ontology A Ontology B Feature Model Mapping Results Text Files (B) Text Files (A) Rainbow Rainbow Model Builder Calculator System overview– Part II

  11. LIVING_THINGS ANIMAL PLANT LIVING_THINGS HUMAN CAT TREE GRASS ANIMAL PLANT HUMAN CAT TREE GRASS MAN WOMAN ARBOR FRUTEX MAN WOMAN ARBOR FRUTEX The model builder • Mutually exclusive and exhaustive • Leaf classes • C+and C-

  12. The calculator • Naïve Bayes text classifier tends to give extreme values (1/0) • Calculating conditional probabilities from raw classification data by taking average

  13. Categories in WeaponsA.n3 Num. of exemplars TANK-VEHICLE 170 AIR-DEFENSE-GUN 20 SAUDI-NAVAL-MISSILE-CRAFT 10 An Example of the Calculator Ontology for Weapons TANK-VEHICLE APC AIR-DEFENSE-GUN Classifier 200 SAUDI-NAVAL- MISSILE-CRAFT P(TANK-VEHICLE | APC) = 170 /200= 0.85 P(AIR-DEFENSE-GUN | APC) = 0.10 P(SAUDI-NAVAL-MISSILE-CRAFT| APC) = 0.05

  14. Experiments with WEAPONS ontology • WeaponsA.n3 and WeaponsB.n3 • Information Interoperation and Integration Conference (http://www.atl.lmco.com/projects/ontology/i3con.html) • Both have over 80 classes defined • More than 60 classes are leaf classes

  15. WeaponsA.n3 Part of WeaponsA.n3 WEAPON CONVENTIONAL- WEAPON ARMORED- COMBAT-VEHICLE MODERN- NAVAL-SHIP WARPLANE AIRCRAFT-CARRIER PATROL-CRAFT SUPER-ETENDARD - TANK-VEHICLE

  16. WeaponsB.n3 Part of WeaponsB.n3 WEAPON CONVENTIONAL- WEAPON ARMORED- COMBAT-VEHICLE MODERN- NAVAL-SHIP WARPLANE FIGHTER-PLANE AIRCRAFT-CARRIER PATROL- WARTER-CRAFT TANK-VEHICLE - FIGHTER-ATTACK-PLANE LIGHT-TANK APC LIGHT-AIRCRAFT-CARRIER PATROL- BOAT- RIVER PATROL- BOAT SUPER-ETENDARD-FIGHTER

  17. Part of WeaponsB.n3 Expected Results WeaponsA.n3 AIRCRAFT-CARRIER SUPER- ETENDARD PATROL-CRAFT TANK-VEHICLE FIGHTER-PLANE LIGHT-AIRCRAFT-CARRIER PATROL- WARTER-CRAFT APC FIGHTER-ATTACK-PLANE LIGHT-TANK SUPER-ETENDARD-FIGHTER PATROL- BOAT- RIVER PATROL- BOAT WeaponsB.n3

  18. A Typical Report P(APC | Ci) where i = 1 … 63 ...... ……

  19. New Classes Whole file Prob Sentences with Keywords Prob LIGHT-AIRCRAFT-CARRIER AIRCRAFT-CARRIER 0.65 AIRCRAFT-CARRIER 0.57 APC SILKWORM-MISSILE-MOD 0.46 SELF-PROPELLED-ARTILLERY 0.36 SUPER-ETENDARD-FIGHTER SILKWORM-MISSILE-MOD 0.66 MRBM 0.51 FIGHTER-ATTACK-PLANE SILKWORM-MISSILE-MOD 0.83 MRBM 0.38 PATROL-WATERCRAFT SILKWORM-MISSILE-MOD 0.28 PATROL-CRAFT 0.52 PATROL-BOAT-RIVER SILKWORM-MISSILE-MOD 0.65 PATROL-CRAFT 0.54 PATROL-BOAT SILKWORM-MISSILE-MOD 0.51 PATROL-CRAFT 0.66 LIGHT-TANK SILKWORM-MISSILE-MOD 0.56 TANK-VEHICLE 0.3 FIGHTER-PLANE AIRCRAFT-CARRIER 0.49 MRBM 0.38 classes with highest conditional probability

  20. HUMAN MAN WOMAN Experiment with LIVING_THINGS ontology • P(MAN | HUMAN) • P (WOMAN | HUMAN) • Find a mapping for GIRL

  21. HUMAN MAN WOMAN Experiment Results (1) Results of experiment (1) P (MAN | HUMAN) = 0.62 P (WOMAN | HUMAN) = 0.38

  22. P(ANIMAL | GIRL) 0.83 P(PLANT | GIRL) 0.17 P(HUMAN | GIRL) 0.92 P(ANIMAL | GIRL) 0.76 P(CAT | GIRL) 0.08 P(PLANT | GIRL) 0.23 P(WOMAN | GIRL) 0.63 P(HUMAN | GIRL) 0.70 P(MAN | GIRL) 0.37 P(CAT | GIRL) 0.30 P(MAN | GIRL) 0 P(DOG | GIRL) 0.56 P(WOMAN | GIRL) 1 P(CAT | GIRL) 0.01 P(HUMAN | GIRL) 0.43 P(PYCNOGONID | GIRL) 0 Experiment Results (2) With clustering on exemplars Without clustering on exemplars clusty.com with additional classes

  23. Concepts Queries living+things Living+things animal Living+things+animal+Animalia plant Living+things+plant+Plantae cat Living+things+animal+Animalia+cat+Felidae human Living+things+animal+Animalia+human+intelligent man Living+things+animal+Animalia+human+intelligent+man+male woman Living+things+animal+Animalia+human+intelligent+woman+female tree Living+things+plant+Plantae+tree grass Living+things+plant+Plantae+grass frutex Living+things+plant+Plantae+tree+Frutex arbor Living+things+plant+Plantae+tree+arbor Additional Experiments: Different Queries Queries augmented with class properties

  24. Conditional Probability Whole Keyword Sentences P(MAN | HUMAN) 0.91 0.93 P(WOMAN | HUMAN) 0.09 0.07 Conditional Probability Whole Keyword Sentences WOMAN HUMAN MAN P(ANIMAL | GIRL) 0.9 0.83 P(PLANT | GIRL) 0.1 0.17 P(HUMAN | GIRL) 0.78 0.83 P(CAT | GIRL) 0.22 0.17 P(MAN | GIRL) 0.14 0.16 P(WOMAN | GIRL) 0.86 0.84 Experiment Results (3) Results of experiment (1) with new queries Results of experiment (2) with new queries

  25. Limitation 1: Relevancy !=similarity Search Results for concept A Text related to concept A Text against concept A Text for concept A i.e. desired exemplars Text for related concept B

  26. HUMAN MAN WOMAN Limitation 2: “Conditional Probability” • An exemplar is a combination of strings that represent some usage of a concept. • An exemplar is not an instance of a concept. • The way we calculate conditional probability is an estimation.

  27. Limitation 3: Popularity !=relevancy • Limited by a search engine’s algorithm • PageRank™ • Popularity does not equal relevancy • Weight cannot be specified for words in a search query

  28. Related Research • UMBC OntoMapper • Sushama Prasad, Peng Yun and Finin Tim, A Tool for Mapping between Two Ontologies Using Explicit Information, AAMAS 2002 Workshop on Ontologies and Agent Systems, 2002. • CAIMEN • Lacher S. Martin and Groh Georg ,Facilitating the Exchange of Explicit Knowledge through Ontology Mappings, Proc of the Fourteenth International FLAIRS conference, 2001. • GLUE • Doan Anhai, Madhavan Jayant, Dhamankar Robin, Domingos Pedro, and Halevy Alon, Learning to Match Ontologies on the Semantic Web, WWW2002, May, 2002. • Google Conditional Probability • P(HUMAN | MAN) = 1.77 billion / 2.29 billion = 0.77 • P(HUMAN | WOMAN) = 0.6 billion / 2.29 billion = 0.26 • Wyatt D., Philipose M., and Choudhury T., Unsupervised Activity Recognition Using Automatically Mined Common Sense. Proceedings of AAAI-05. pp. 21-27.

  29. Conclusion and Future Work • Text retrieved from the web can be used as exemplars for text classification based ontology mapping • Many parameters affect the quality of the exemplars • There are noise contained in the processed documents • Future work • Clustering • Restrict search to highly relevant sites and web resources

  30. Questions • Thank you  • yangyu1@umbc.edu • ypeng@umbc.edu

More Related