200 likes | 335 Views
MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES [2009]. Danushka Bollegala , Yutaka Matsuo, Mitsuru Ishizuka Presented by: Rucha Samant. OUTLINE. Introduction Method Retrieving Contexts Extracting Lexical Patterns Identifying Semantic Relations
E N D
MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTICRELATIONS USING WEB SEARCH ENGINES[2009] DanushkaBollegala, Yutaka Matsuo, Mitsuru Ishizuka Presented by: RuchaSamant
OUTLINE • Introduction • Method • Retrieving Contexts • Extracting Lexical Patterns • Identifying Semantic Relations • Measuring Relational similarity • Experiments • Conclusions
INTRODUCTION • Implicit semantic relations between two words Bird Cat • Ostrich is a large bird • Lion is a large cat • Google acquired YouTube Word pairs
Relation Extraction Information Retrieval Analogy Detection
Web search component query a Web search engine to find the contexts Pattern extraction component extract lexical patterns that express semantic relations Pattern clustering component cluster the patterns to identify particular relation Similarity computation component. compute the relational similarity between two word-pairs OUTLINE OF THE SIMILARITY METHOD
Snippets - brief summaries provided by Web search engines along with the search results. containing two words, captures the local context Query “Google * * * YouTube” RETRIEVAL CONTEXTS
“ * ” - wildcard operator, matches one word or none. To retrieve snippets for a word pair (A,B) “A * B”, “B * A”, “A * * B”, “B * * A”,“A * * * B”, “B * * * A”, and A B query words co-occur within a maximum of three words “ ” ensure that the two words appear in the order Remove duplicates if they contain the exact sequence of all words RETRIEVALCONTEXTS
Consist of the following three steps Step 1: replace two words with two variables X and Y replace all numeric values by D do not remove punctuation marks Step 2: exactly one X and one Y must exist in a subsequence the maximum length of a subsequence is L words. gaps should not exceed g words. total length of all gaps should not exceed G words. expand all negation contractions, didn’t → did not Step 3: select subsequences withfrequency greater than N EXTRACTING LEXICAL PATTERNS
A semantic relation can be expressed using more than one pattern. If there are many related patterns between two word-pairs, we can expect a high relational similarity. Cluster lexical patterns using their distributions over word-pairs , to identify semantically related patterns. IDENTIFYING SEMANTIC RELATIONS
Word pair frequency vector of pattern p Sorts the pattern in descending order of their occurrence The vector sum of all word pair frequency vectors corresponding to the patterns that belong to that cluster
MEASURING RELATIONAL SIMILARITY • Given a set of similar and dissimilar word pairs the relational similarity can be measured using “ Mahalanobis distance” as follows: dA(xi , xj ) = (xi , xj ) T A (xi , xj ) (where A is a positive definite matrix) Advantages of using Mahalanobis distance: • Doesnot assume that features are independent • Learned from few data points
EXPERIMENTS • five relation types • ACQUIRER-ACQUIREE • PERSON-BIRTHPLACE • CEO-COMPANY • COMPANY-HEADQUARTERS • PERSON-FIELD
Lexical Patterns run the pattern extraction algorithm L = 5, g = 2, and G = 4. total number of unique patterns is 473910 we only select the 148655patterns that occur at least twice EXPERIMENTS- LEXICAL PATTERNS
the top 10 clusters with the largest numberof lexical patterns. the top four patterns that occur in most number of word-pairs EXPERIMENTS-RELATION CLASSIFICATION
compare the relational similarity measures VSM: each word-pair is represented by a vector of pattern frequencies the relational similarity between two word-pairs is computed as the cosine similarity LRA: Latent Relational Analysis create a matrix in which the rows represent word-pairs and the columns represent lexical patterns singular value decomposition (SVD) RELATIONAL SIMILARITY MEASURES
EUC: set A in distance formula to the identity matrix compute relation similarity using pattern clusters PROP: the proposed relational similarity measure. RELATIONAL SIMILARITY MEASURES
We proposed a method to compute the similarity between implicit semantic relations in two word-pairs. only a few queries to compute quickly computerelational similarity for unseen word-pairs a general framework- designing relational similarity measures can be modeled as searching A for a matrix CONCLUSIONS