MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTICRELATIONS USING WEB SEARCH ENGINES Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka (WSDM’09) Speaker: Yi-Ling Tai Date:2009/11/23

OUTLINE • Introduction • Method • Retrieving Contexts • Extracting Lexical Patterns • Identifying Semantic Relations • Measuring Relational similarity • Experiments • Conclusions

INTRODUCTION • Implicit semantic relations between two words • Google, Youtube (acquisition) • Ostrich, bird (is a large) • Similar semantic relations between two words pairs • Google, Youtube → Yahoo, Inktomi • Ostrich, bird → lion, cat • This paper proposed a method to compute the similarity between implicit semantic relations in two word-pairs.

OUTLINE OF THE SIMILARITY METHOD

OUTLINE OF THE SIMILARITY METHOD • Web search component • query a Web search engine to ﬁnd the contexts • Pattern extraction component • extract lexical patterns that express semantic relations • Pattern clustering component • cluster the patterns to identify particular relation • Similarity computation component. • compute the relational similarity between two word-pairs

RETRIEVAL CONTEXTS • Snippets - brief summaries provided by Web search engines along with the search results. • containing two words, captures the local context • query “Google * *YouTube”

RETRIEVAL CONTEXTS • “ * ” - wildcard operator, matches one word or none. • To retrieve snippets for a word pair (A,B) • “A * B”, “B * A”, “A * * B”, “B * * A”,“A * * * B”, “B * * * A”, and A B • query words co-occur within a maximum of three words • “ ” ensure that the two words appear in the order • remove duplicates • if they contain the exact sequence of all words

EXTRACTING LEXICAL PATTERNS • shallow lexical pattern extraction algorithm • extract the semantic relations between two words from web snippets. • not require language preprocessing • Consist of the following three steps • Step 1: • Replace two words with two variables X and Y • replace all numeric values by D • do not remove punctuation marks

EXTRACTING LEXICAL PATTERNS • Step 2: • Exactly one X and one Y must exist in a subsequence • The maximum length of a subsequence is L words. • Gaps should not exceed g words. • Total length of all gaps should not exceed G words. • expand all negation contractions, didn’t → did not • Step 3: • select subsequences withfrequency greater than N

EXTRACTING LEXICAL PATTERNS • a modifiedprefixspan algorithm • consider all the words in a snippet • not limited to extracting patterns from only the mid-fix • X to acquire Y, X acquire Y, X to acquire Y for.

IDENTIFYING SEMANTIC RELATIONS • A semantic relation can be expressed using more than one pattern. • If there are many related patterns between two word-pairs, we can expect a high relational similarity. • cluster lexical patterns using their distributions over word-pairs , to identify semantically related patterns.

IDENTIFYING SEMANTIC RELATIONS • p : word-pair frequency vector of pattern p • : frequency of pattern p occurs with the word-pair • SORT : sorts the patterns in the descending order of their total occurrence in all word-pairs • c : thevector sum of all word-pair frequency vectors corresponding to thepatterns that belong to that cluster. • : denote the vector addition • : similarity threshold

MEASURING RELATIONAL SIMILARITY • :feature vector of a word-pair • Elements of the feature vector, are the totalfrequencies of theword-pair in each cluster. • the relational similaritybetweentwo word-pairs • is a correlation matrix

MEASURING RELATIONAL SIMILARITY • the correlation betweenclusters and by the elementin • is the union between the two clusters

EXPERIMENTS • Dataset • 100 instances (word or named-entity pairs) • ﬁve relation types • ACQUIRER-ACQUIREE • PERSON-BIRTHPLACE • CEO-COMPANY • COMPANY-HEADQUARTERS • PERSON-FIELD

EXPERIMENTS • manually select 20 instances for each types. • Wikipedia • online newspapers • company reviews • For each instance, download snippets using YahooBOSS API

EXPERIMENTS- LEXICAL PATTERNS • Lexical Patterns • run the pattern extraction algorithm • L = 5, g = 2, and G = 4. • total number of unique patterns is 473910 • we only select the 148655patterns that occur at least twice.

EXPERIMENTS - PATTERN CLUSTERS • Ratio:singletons to total number of clusters

EXPERIMENTS-RELATION CLASSIFICATION • We evaluate the proposed relational similarity measure in a relation classification task. • k-nearest neighbor classification • classification accuracy • average precision • Rel(r):a binary valued function thatreturns 1 if the word-pair at rank r has the same relation

EXPERIMENTS-RELATION CLASSIFICATION • =0.955 • 2629 non-singleton clusters • 6930 singletons

EXPERIMENTS-RELATION CLASSIFICATION • the top 10 clusters with the largest numberof lexical patterns. • the top four patterns that occur in most number of word-pairs

RELATIONAL SIMILARITY MEASURES compare the relational similarity measures • VSM: • each word-pair is represented by a vector of pattern frequencies • the relational similarity between two word-pairs is computed as the cosine similarity • LRA: • Latent Relational Analysis • Create a matrix in which the rows represent word-pairs and the columns represent lexical patterns • singular value decomposition (SVD)

RELATIONAL SIMILARITY MEASURES • IP: • set in Formula 2 to the identity matrix • compute relation similarity using pattern clusters • CORR: • the proposed relational similarity measure.

RELATIONAL SIMILARITY MEASURES

CONCLUSIONS • We proposed a method to compute the similarity between implicit semantic relations in two word-pairs. • only a few queries to compute • quickly computerelational similarity for unseen word-pairs • a general framework- designing relational similarity measures can be modeled as searching for a matrix

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

Presentation Transcript

Search Engines for Semantic Web Knowledge

Measuring Semantic Similarity between Words Using Web Search Engines Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuk

Web Search Engines

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES [2009]

Web search engines

Web Search Engines

Thesaurus Extension using Web Search Engines

Measuring Semantic Similarity between Words Using HowNet

Finding Implicit Relations in the Semantic Web

Using Search Engines

Web Search Engines

Web search engines

Measuring the Semantic Web

Web Search Engines

Using Search Engines

Web Search Engines

Measuring the Semantic Similarity of Texts

Web Search Engines

Web search engines

Web Search Engines