Presenter: Ziqi Zhang OAK Research Group, Department of Computer Science, University of Sheffield

A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from Wikipedia Presenter: Ziqi Zhang OAK Research Group, Department of Computer Science, University of Sheffield Authors: Ziqi Zhang, Anna Lisa Gentile, Lei Xia, José Iria, Sam Chapman

In this presentation… Introduction to semantic relatedness Motivation to this research Methodology: random walk, Wikipedia, semantic relatedness Experiment and Evaluation: computing semantic relatedness, semantic relatedness for named entity disambiguation

> Introduction Semantic Relatedness computational linguistics Malta Volcano ashes COLING ? ? LREC ACL Airline computer science • It captures broader sense than semantic similarity • It enables many complex NLP tasks, e.g., sense disambiguation, lexicon construction Semantic relatedness (SR) measures how much words or concepts are related by encompassing all kinds of relations between them

> Introduction Method and Literature • Relatedness is computed by aggregating and balancing these “semantic” elements using mathematical formula • Some best known works: Resnik (1995), Leacock & Chodorow (1998), Strube & Ponzetto (2006), Zesch et al. (2008), Gabrilovich & Markovitch (2007) • Recent trend: towards using collaborative lexical resources, such as Wikipedia, Wiktionary Typically, lexical resources (e.g., WordNet, Wikipedia) are needed to provide structural and content information about concepts

> Motivation Another SR measure, why? • On a Wiki • page: Infobox Content words Title Links Category Redirect Lists • Which are useful for SR? Which are more useful than others? • Can we combine them? • How to combine them? • Can we gain more if we combine them? Wikipedia contains rich and diverse structural and content information about concepts and entities

> Motivation The Research Proposing a method that naturally integrates diverse features in a balanced way, and studying the importance of different features This paper aims to answer these questions by

> Methodology Overview of the method “NLP” Wiki Page Retrieval “Computational Linguistics” Feature Extraction weight=x F. 1 “NLP” “Computational Linguistics” F. 2 Rel? “NLP” weight=y F. 3 weight=z F. 1 F’. 1 F’. 2 F. 2 Random Walk F’. 3 F. 3

> Methodology Wiki page retrieval Wiki Page Retrieval • Problem: Ambiguities of input words (surface) • Solution: Collect all pages (sense page), compute pair-wise relatedness between all senses, choose the pair with maximum score National Liberal Party Natural Language Processing “NLP” Computational Linguistics (science) Computational Linguistics (journal) “Computational Linguistics” Objective: given two words/phrases, find the corresponding information pages from Wikipedia that they refer to

> Methodology Feature Extraction • Page title and redirect target • Content words from the first section; or top n frequent words from the entire page • Page categories (search depth = 2) • Outgoing link target in list structure • Other outgoing link targets • Descriptive/Definitive noun (first noun phrase after be in the first sentence) • All features formulated at word-level Objective: identify useful features to represent each sense of a surface for algorithmic consumption

> Methodology Random Walk – Graph Construction “Computational Linguistics (science)” “Natural Language Processing” L1 has_ link has_ link has_title has_title has_ link T2 has_ link L2 has_ link T1 has_ category L3 has_ category has_ category has_ category L4 has_ link C1 C2 C3 L5 Objective: plot surfaces, their senses and features on a single graph, so senses will be connected by shared features

> Methodology Random Walk – Graph Construction “Computational Linguistics (science)” “Natural Language Processing” L1 has_ link has_ link has_title has_title has_ link T2 has_ link L2 has_ link T1 has_ category L3 has_ category has_ category has_ category L4 has_ link C1 C2 C3 L5 • Intuition: a walker takes nsteps, in each step a random route is taken Objective: plot surfaces, their senses and features on a single graph, so senses will be connected by shared features

> Methodology Random Walk – Graph Construction “Computational Linguistics (science)” “Natural Language Processing” L1 has_ link has_ link has_title has_title has_ link T2 has_ link L2 has_ link T1 has_ category L3 has_ category has_ category has_ category L4 has_ link C1 C2 C3 L5 • Intuition: starting from a node, in n step, one can reach a limited set of other nodes. Objective: plot surfaces, their senses and features on a single graph, so senses will be connected by shared features

> Methodology Random Walk – Graph Construction “Computational Linguistics (science)” “Natural Language Processing” L1 has_ link has_ link has_title has_title has_ link T2 has_ link L2 has_ link T1 has_ category L3 has_ category has_ category has_ category L4 has_ link C1 C2 C3 L5 • Intuition: the more routes connecting the desired end nodes, and the more likely the routes are taken, the more relevant two senses are Objective: plot surfaces, their senses and features on a single graph, so senses will be connected by shared features

> Methodology Random Walk – Graph Construction “Computational Linguistics (science)” “Natural Language Processing” L1 Routes are established by feature extraction and graph construction has_ link has_ link has_title has_title has_ link T2 has_ link L2 has_ link T1 has_ category L3 has_ category has_ category has_ category L4 has_ link C1 C2 C3 L5 • Intuition: the more routes connecting the desired end nodes, and the more likely the routes are taken, the more relevant two senses are Objective: plot surfaces, their senses and features on a single graph, so senses will be connected by shared features

> Methodology Random Walk – Graph Construction “Computational Linguistics (science)” “Natural Language Processing” L1 “Likelihood” is modelled by importance of each type of feature, and to be studied by experiments has_ link has_ link has_title has_title has_ link T2 has_ link L2 has_ link T1 has_ category L3 has_ category has_ category has_ category L4 has_ link C1 C2 C3 L5 • Intuition: the more routes connecting the desired end nodes, and the more likely the routes are taken, the more relevant two senses are Objective: plot surfaces, their senses and features on a single graph, so senses will be connected by shared features

> Methodology Random Walk – The Math • Adjacency matrix modelling distribution of weights for different features • T-step random walk is achieved by matrix calculation • Translating probability to relatedness Random walk is simulated via matrix calculation and transformation

> Experiment Experiment & Evaluation • The experiments are designed to achieve three objectives • Analyse the importance of each proposed feature • Evaluate effectiveness of the random walk method for computing semantic relatedness • Evaluate the usefulness of the method for solving other NLP problems – Named Entity Disambiguation (NED)

> Experiment Feature Analysis • Simulated Annealing optimisation (Nie et al., 2005) method is used to perform the analysis, in which • 200 pair of words from WordSim353 is used • To begin with, we treat each feature equally by assigning same weights (weight model) • Compute SR using the weight model, and evaluate against the gold standard • Hundreds of iterations are run, in each turn, different weight model is generated randomly • Manually analysing the weight model that contribute to the highest performance achieved on this dataset, eliminating least important features or combining them into other features that are semantically similar

> Experiment Feature Analysis - findings Achieved best accuracy of 0.45 on the data, compared to best in the literature of 0.5 by Zesch et al. (2008)

> Experiment Feature Analysis - findings This setting is then used for further evaluation

> Experiment Evaluating Computation of SR Three datasets are chosen: different set of 153 pairs of words from WordSim353; 65 pairs from Rubenstein &Goodenough (1965), RG65; 30 pairs from Miller & Charles (1991), MC30 Compared against: a collection of WordNet-based algorithms and other state-of-the-art methods for SR

> Experiment Evaluating Usefulness of SR for NED • The NED method in a nutshell (Details: Gentile et al., 2009) • Identify surfaces of NEs that occur in a text passage and that are defined by Wikipedia, retrieve corresponding sense pages • Computing SR of each pair of their underlying senses • The sense of a surface is determined collectively by the senses of other surfaces found in the text (contexts) • Three functions are defined to capture this collective context

> Experiment Evaluating Usefulness of SR for NED Dataset: 20 news stories by Cucerzan (2007), each story contains 10 – 50 NEs

Conclusion • Different structural and content information in Wikipedia all contribute to the task, but in different weights • Combining these different features in a uniform measure can improve performance In future • Can we use simpler similarity functions to obtain same results? • Can we integrate different lexical resources? • How to compute relatedness/similarity of longer text passages? Computing SR isn’t an easy task

Thank you! References (complete list can be found in paper) • Cucerzan, S. (2007). Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP’07 • Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. (2002). Placing search in context: the concept revisited. In ACM Transactions on Information Systems, 20 (1), pp. 116 – 131 • Gabrilovich, E., Markovitch, S. (2007). Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of IJCAI’07, pp. 1606-1611 • Gentile, A., Zhang, Z., Xia, L., Iria, J. (2009). Graph-based semantic relatedness for named entity disambiguation. In S3T • Leacock, C., Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. In C. Fellbaum (Ed.), WordNet. An Electronic Lexical Database, Chp. 11, pp. 265-283. • Miller, G., Charles, W. (1991). Contextual correlates of semantic similarity. In Language and Cognitive Processes, 6(1): 1-28 • Nie, Z., Zhang, Y., Wen, J., Ma, W. (2005). Object-level ranking: bringing order to web objects. In Proceedings of WWW’05 • Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of IJCAI-95, pp. 448-453 • Rubenstein, H., Goodenough, J. (1965). Contextual correlates of synonymy. In Communications of the ACM, 8(10):627-633 • Strube, M., Ponzetto, S. (2006). WikiRelate! Computing semantic relatedness using Wikipedia. In AAAI’06 • Zesch, T., Müller, C., Gurevych, I. (2008). Using Wiktionary for computing semantic relatedness. In Proceedings of AAAI’08

Presenter: Ziqi Zhang OAK Research Group, Department of Computer Science, University of Sheffield