MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES [2009]

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTICRELATIONS USING WEB SEARCH ENGINES[2009] DanushkaBollegala, Yutaka Matsuo, Mitsuru Ishizuka Presented by: RuchaSamant

OUTLINE • Introduction • Method • Retrieving Contexts • Extracting Lexical Patterns • Identifying Semantic Relations • Measuring Relational similarity • Experiments • Conclusions

INTRODUCTION • Implicit semantic relations between two words Bird Cat • Ostrich is a large bird • Lion is a large cat • Google acquired YouTube Word pairs

Relation Extraction Information Retrieval Analogy Detection

OUTLINE OF THE SIMILARITY METHOD

Web search component query a Web search engine to ﬁnd the contexts Pattern extraction component extract lexical patterns that express semantic relations Pattern clustering component cluster the patterns to identify particular relation Similarity computation component. compute the relational similarity between two word-pairs OUTLINE OF THE SIMILARITY METHOD

Snippets - brief summaries provided by Web search engines along with the search results. containing two words, captures the local context Query “Google * * * YouTube” RETRIEVAL CONTEXTS

“ * ” - wildcard operator, matches one word or none. To retrieve snippets for a word pair (A,B) “A * B”, “B * A”, “A * * B”, “B * * A”,“A * * * B”, “B * * * A”, and A B query words co-occur within a maximum of three words “ ” ensure that the two words appear in the order Remove duplicates if they contain the exact sequence of all words RETRIEVALCONTEXTS

Consist of the following three steps Step 1: replace two words with two variables X and Y replace all numeric values by D do not remove punctuation marks Step 2: exactly one X and one Y must exist in a subsequence the maximum length of a subsequence is L words. gaps should not exceed g words. total length of all gaps should not exceed G words. expand all negation contractions, didn’t → did not Step 3: select subsequences withfrequency greater than N EXTRACTING LEXICAL PATTERNS

A semantic relation can be expressed using more than one pattern. If there are many related patterns between two word-pairs, we can expect a high relational similarity. Cluster lexical patterns using their distributions over word-pairs , to identify semantically related patterns. IDENTIFYING SEMANTIC RELATIONS

Word pair frequency vector of pattern p Sorts the pattern in descending order of their occurrence The vector sum of all word pair frequency vectors corresponding to the patterns that belong to that cluster

MEASURING RELATIONAL SIMILARITY • Given a set of similar and dissimilar word pairs the relational similarity can be measured using “ Mahalanobis distance” as follows: dA(xi , xj ) = (xi , xj ) T A (xi , xj ) (where A is a positive definite matrix) Advantages of using Mahalanobis distance: • Doesnot assume that features are independent • Learned from few data points

EXPERIMENTS • ﬁve relation types • ACQUIRER-ACQUIREE • PERSON-BIRTHPLACE • CEO-COMPANY • COMPANY-HEADQUARTERS • PERSON-FIELD

Lexical Patterns run the pattern extraction algorithm L = 5, g = 2, and G = 4. total number of unique patterns is 473910 we only select the 148655patterns that occur at least twice EXPERIMENTS- LEXICAL PATTERNS

the top 10 clusters with the largest numberof lexical patterns. the top four patterns that occur in most number of word-pairs EXPERIMENTS-RELATION CLASSIFICATION

compare the relational similarity measures VSM: each word-pair is represented by a vector of pattern frequencies the relational similarity between two word-pairs is computed as the cosine similarity LRA: Latent Relational Analysis create a matrix in which the rows represent word-pairs and the columns represent lexical patterns singular value decomposition (SVD) RELATIONAL SIMILARITY MEASURES

EUC: set A in distance formula to the identity matrix compute relation similarity using pattern clusters PROP: the proposed relational similarity measure. RELATIONAL SIMILARITY MEASURES

RELATIONAL SIMILARITY MEASURES

We proposed a method to compute the similarity between implicit semantic relations in two word-pairs. only a few queries to compute quickly computerelational similarity for unseen word-pairs a general framework- designing relational similarity measures can be modeled as searching A for a matrix CONCLUSIONS

THANK YOU

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES [2009]