1 / 26

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES. Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka (WSDM’09) Speaker : Yi-Ling Tai Date : 2009/11/23. OUTLINE. Introduction Method Retrieving Contexts Extracting Lexical Patterns

Download Presentation

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTICRELATIONS USING WEB SEARCH ENGINES Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka (WSDM’09) Speaker: Yi-Ling Tai Date:2009/11/23

  2. OUTLINE • Introduction • Method • Retrieving Contexts • Extracting Lexical Patterns • Identifying Semantic Relations • Measuring Relational similarity • Experiments • Conclusions

  3. INTRODUCTION • Implicit semantic relations between two words • Google, Youtube (acquisition) • Ostrich, bird (is a large) • Similar semantic relations between two words pairs • Google, Youtube → Yahoo, Inktomi • Ostrich, bird → lion, cat • This paper proposed a method to compute the similarity between implicit semantic relations in two word-pairs.

  4. OUTLINE OF THE SIMILARITY METHOD

  5. OUTLINE OF THE SIMILARITY METHOD • Web search component • query a Web search engine to find the contexts • Pattern extraction component • extract lexical patterns that express semantic relations • Pattern clustering component • cluster the patterns to identify particular relation • Similarity computation component. • compute the relational similarity between two word-pairs

  6. RETRIEVAL CONTEXTS • Snippets - brief summaries provided by Web search engines along with the search results. • containing two words, captures the local context • query “Google * *YouTube”

  7. RETRIEVAL CONTEXTS • “ * ” - wildcard operator, matches one word or none. • To retrieve snippets for a word pair (A,B) • “A * B”, “B * A”, “A * * B”, “B * * A”,“A * * * B”, “B * * * A”, and A B • query words co-occur within a maximum of three words • “ ” ensure that the two words appear in the order • remove duplicates • if they contain the exact sequence of all words

  8. EXTRACTING LEXICAL PATTERNS • shallow lexical pattern extraction algorithm • extract the semantic relations between two words from web snippets. • not require language preprocessing • Consist of the following three steps • Step 1: • Replace two words with two variables X and Y • replace all numeric values by D • do not remove punctuation marks

  9. EXTRACTING LEXICAL PATTERNS • Step 2: • Exactly one X and one Y must exist in a subsequence • The maximum length of a subsequence is L words. • Gaps should not exceed g words. • Total length of all gaps should not exceed G words. • expand all negation contractions, didn’t → did not • Step 3: • select subsequences withfrequency greater than N

  10. EXTRACTING LEXICAL PATTERNS • a modifiedprefixspan algorithm • consider all the words in a snippet • not limited to extracting patterns from only the mid-fix • X to acquire Y, X acquire Y, X to acquire Y for.

  11. IDENTIFYING SEMANTIC RELATIONS • A semantic relation can be expressed using more than one pattern. • If there are many related patterns between two word-pairs, we can expect a high relational similarity. • cluster lexical patterns using their distributions over word-pairs , to identify semantically related patterns.

  12. IDENTIFYING SEMANTIC RELATIONS • p : word-pair frequency vector of pattern p • : frequency of pattern p occurs with the word-pair • SORT : sorts the patterns in the descending order of their total occurrence in all word-pairs • c : thevector sum of all word-pair frequency vectors corresponding to thepatterns that belong to that cluster. • : denote the vector addition • : similarity threshold

  13. MEASURING RELATIONAL SIMILARITY • :feature vector of a word-pair • Elements of the feature vector, are the totalfrequencies of theword-pair in each cluster. • the relational similaritybetweentwo word-pairs • is a correlation matrix

  14. MEASURING RELATIONAL SIMILARITY • the correlation betweenclusters and by the elementin • is the union between the two clusters

  15. EXPERIMENTS • Dataset • 100 instances (word or named-entity pairs) • five relation types • ACQUIRER-ACQUIREE • PERSON-BIRTHPLACE • CEO-COMPANY • COMPANY-HEADQUARTERS • PERSON-FIELD

  16. EXPERIMENTS • manually select 20 instances for each types. • Wikipedia • online newspapers • company reviews • For each instance, download snippets using YahooBOSS API

  17. EXPERIMENTS- LEXICAL PATTERNS • Lexical Patterns • run the pattern extraction algorithm • L = 5, g = 2, and G = 4. • total number of unique patterns is 473910 • we only select the 148655patterns that occur at least twice.

  18. EXPERIMENTS - PATTERN CLUSTERS • Ratio:singletons to total number of clusters

  19. EXPERIMENTS-RELATION CLASSIFICATION • We evaluate the proposed relational similarity measure in a relation classification task. • k-nearest neighbor classification • classification accuracy • average precision • Rel(r):a binary valued function thatreturns 1 if the word-pair at rank r has the same relation

  20. EXPERIMENTS-RELATION CLASSIFICATION • =0.955 • 2629 non-singleton clusters • 6930 singletons

  21. EXPERIMENTS-RELATION CLASSIFICATION • the top 10 clusters with the largest numberof lexical patterns. • the top four patterns that occur in most number of word-pairs

  22. RELATIONAL SIMILARITY MEASURES compare the relational similarity measures • VSM: • each word-pair is represented by a vector of pattern frequencies • the relational similarity between two word-pairs is computed as the cosine similarity • LRA: • Latent Relational Analysis • Create a matrix in which the rows represent word-pairs and the columns represent lexical patterns • singular value decomposition (SVD)

  23. RELATIONAL SIMILARITY MEASURES • IP: • set in Formula 2 to the identity matrix • compute relation similarity using pattern clusters • CORR: • the proposed relational similarity measure.

  24. RELATIONAL SIMILARITY MEASURES

  25. CONCLUSIONS • We proposed a method to compute the similarity between implicit semantic relations in two word-pairs. • only a few queries to compute • quickly computerelational similarity for unseen word-pairs • a general framework- designing relational similarity measures can be modeled as searching for a matrix

More Related