1 / 20

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES [2009]

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES [2009]. Danushka Bollegala , Yutaka Matsuo, Mitsuru Ishizuka Presented by: Rucha Samant. OUTLINE. Introduction Method Retrieving Contexts Extracting Lexical Patterns Identifying Semantic Relations

yves
Download Presentation

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES [2009]

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTICRELATIONS USING WEB SEARCH ENGINES[2009] DanushkaBollegala, Yutaka Matsuo, Mitsuru Ishizuka Presented by: RuchaSamant

  2. OUTLINE • Introduction • Method • Retrieving Contexts • Extracting Lexical Patterns • Identifying Semantic Relations • Measuring Relational similarity • Experiments • Conclusions

  3. INTRODUCTION • Implicit semantic relations between two words Bird Cat • Ostrich is a large bird • Lion is a large cat • Google acquired YouTube Word pairs

  4. Relation Extraction Information Retrieval Analogy Detection

  5. OUTLINE OF THE SIMILARITY METHOD

  6. Web search component query a Web search engine to find the contexts Pattern extraction component extract lexical patterns that express semantic relations Pattern clustering component cluster the patterns to identify particular relation Similarity computation component. compute the relational similarity between two word-pairs OUTLINE OF THE SIMILARITY METHOD

  7. Snippets - brief summaries provided by Web search engines along with the search results. containing two words, captures the local context Query “Google * * * YouTube” RETRIEVAL CONTEXTS

  8. “ * ” - wildcard operator, matches one word or none. To retrieve snippets for a word pair (A,B) “A * B”, “B * A”, “A * * B”, “B * * A”,“A * * * B”, “B * * * A”, and A B query words co-occur within a maximum of three words “ ” ensure that the two words appear in the order Remove duplicates if they contain the exact sequence of all words RETRIEVALCONTEXTS

  9. Consist of the following three steps Step 1: replace two words with two variables X and Y replace all numeric values by D do not remove punctuation marks Step 2: exactly one X and one Y must exist in a subsequence the maximum length of a subsequence is L words. gaps should not exceed g words. total length of all gaps should not exceed G words. expand all negation contractions, didn’t → did not Step 3: select subsequences withfrequency greater than N EXTRACTING LEXICAL PATTERNS

  10. A semantic relation can be expressed using more than one pattern. If there are many related patterns between two word-pairs, we can expect a high relational similarity. Cluster lexical patterns using their distributions over word-pairs , to identify semantically related patterns. IDENTIFYING SEMANTIC RELATIONS

  11. Word pair frequency vector of pattern p Sorts the pattern in descending order of their occurrence The vector sum of all word pair frequency vectors corresponding to the patterns that belong to that cluster

  12. MEASURING RELATIONAL SIMILARITY • Given a set of similar and dissimilar word pairs the relational similarity can be measured using “ Mahalanobis distance” as follows: dA(xi , xj ) = (xi , xj ) T A (xi , xj ) (where A is a positive definite matrix) Advantages of using Mahalanobis distance: • Doesnot assume that features are independent • Learned from few data points

  13. EXPERIMENTS • five relation types • ACQUIRER-ACQUIREE • PERSON-BIRTHPLACE • CEO-COMPANY • COMPANY-HEADQUARTERS • PERSON-FIELD

  14. Lexical Patterns run the pattern extraction algorithm L = 5, g = 2, and G = 4. total number of unique patterns is 473910 we only select the 148655patterns that occur at least twice EXPERIMENTS- LEXICAL PATTERNS

  15. the top 10 clusters with the largest numberof lexical patterns. the top four patterns that occur in most number of word-pairs EXPERIMENTS-RELATION CLASSIFICATION

  16. compare the relational similarity measures VSM: each word-pair is represented by a vector of pattern frequencies the relational similarity between two word-pairs is computed as the cosine similarity LRA: Latent Relational Analysis create a matrix in which the rows represent word-pairs and the columns represent lexical patterns singular value decomposition (SVD) RELATIONAL SIMILARITY MEASURES

  17. EUC: set A in distance formula to the identity matrix compute relation similarity using pattern clusters PROP: the proposed relational similarity measure. RELATIONAL SIMILARITY MEASURES

  18. RELATIONAL SIMILARITY MEASURES

  19. We proposed a method to compute the similarity between implicit semantic relations in two word-pairs. only a few queries to compute quickly computerelational similarity for unseen word-pairs a general framework- designing relational similarity measures can be modeled as searching A for a matrix CONCLUSIONS

  20. THANK YOU

More Related