1.1k likes | 1.26k Views
Overview of Peter D. Turney’s Work on Similarity. From 2001-2008. similarity. Attributional similarity (2001 - 2003) the degree to which two words are synonymous also known as Semantic relatedness and semantic association Relational similarity (2005 - 2008)
E N D
Overview of Peter D. Turney’s Work on Similarity From 2001-2008
similarity • Attributional similarity (2001 - 2003) • the degree to which two words are synonymous • also known as • Semantic relatedness and semantic association • Relational similarity (2005 - 2008) • the degree to which two relations are analogous
Objective evaluation of the approaches by • Attributional similarity • 80 TOFEL Synonym questions • Relational similarity • 374 SAT analogy questions
2001Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL In Proceedings of the 12th European Conference on Machine Learning, pages 491–502, Springer, Berlin, 2001.
1 Introduction • 识别同义词: • 给定一个词和一组候选词,从候选词中选出与给定词意义最相近的一个。 • 核心思想:基于co-occurrence • “a word is characterized by the company it keeps”
1 Introduction: idea • 给定一个词problem和一组候选词{choice1, choice2, …, choicen} • 计算choicei的score(choicei),得分最高的即为同义词。 • uses Pointwise Mutual Information (PMI) • to analyzestatistical data collected by Information Retrieval (IR).
2 formula • Score 1: • Score 2: NEAR为十个单词以内
2 formula • Score 3: 避免反义词如big vs. small • Score 4: 引入上下文context • context word的选择:只选一个(保证样本数)
3 Experiments • Compare with • LSA: Latent Semantic Analysis • 利用百科全书构造初始矩阵X:61,000 * 30,473 • 文档片段:整篇文档 • 压缩降维:SVD • Element: tfidf weight • Similarity: cosine • 学生的TOFEL成绩
Dataset: • 80个TOFEL试题 • 50个ESL考试题
3 Experiments: PMI-IR Vs. LSA • 时间效率 • PMI-IR:程序简单,耗时少 • 2s/query * 8 querys,几乎全部耗时在网络交互 • 并行:2S • LSA:耗时长 • 61,000 * 30,473压缩到61,000 *300,UNIX Station需时大约三小时
3 Experiments • 80个TOFEL试题, 50个ESL考试题 • PMI-IR: 73.75%(59/80) 74%(37/50) • 留学生: 64.5%(51.6/80) • LSA: 64.4%(51.5/80) • 性能: PMI-IR WIN: 10% • 原因 • NEAR的使用,Smaller chunk size • LSA 64.4% • PMI-IR with AND 62.5% • PMI-IR with NEAR 72.5%
4 Conclusion • 结合PMI和IR • 用共现来衡量词语间的相关程度 • PMI • 利用向引擎发送查询 • 解决了数据稀疏的问题
2003Combining independent modules in lexical multiple-choice problems In RANLP-03, pages 482–489, Borovets, Bulgaria (RANLP: Recent Advances in Natural Language Processing )
1 Introduction • There are several approaches to natural language problems • No one will be the best for all problem instances. • How about combine them?
1 Introduction • two main contributions • introduces and evaluates several new modules • for answering multiple-choice synonym questions and analogy questions. • 3 merging rules • presents a novel product rule • compares it with other 2 similar merging rules.
2 Merging rules: the parameter • The parameter of the rules: w • phij >= 0 represents the probability • 第 i 个module 1 <= i <= n • 第 h 个 instance 1 <= h <= m. • 第 j 个choice 1 <= j <= k • Dh,wjbe the probability • assigned by the merging rule to choice jof training instance h when the weights are set to w. • 1<= a(h) <= k be the correct answer for instance
2 Merging rules: old • mixture rule: very common 归一化 • logarithmic rule
2 Merging rules: novel • product rule
3 Synonym: dataset • a training set of 431 4-choice synonym questions • randomly divided them into 331 training questions and 100 testing questions. • Optimize w with the training set
3 Synonym: Modules • LSA • PMI-IR • Thesaurus • queries Wordsmyth (www.wordsmyth.net) • Create synonyms lists for both stem and choices • scored them by their overlap • Connector • used summary pages from querying Google with a pair of words • Weighted sum of • the times when the words appear separated by a symbol • [, ”, :, ,, =, /, ,, (, ] • means, defined, equals, synonym, whitespace, and • the number of times “dictionary” or “thesaurus” appear
3 Synonym: combine results • 3 rules’ accuracies are nearly identical • the product and logarithmic rules assign higher probabilities to correct answers • as evidenced by the mean likelihood.
4 Analogies: dataset • 374 5-choice instances • randomly split the collection into 274 training instances and 100 testing instances. • Eg. cat:meow:: (a) mouse:scamper, (b) bird:peck, (c) dog:bark, (d) horse:groom, (e) lion:scratch
4 Analogies: modules • Phrase vectors • Create vector r to present the relationship between X and Y. • Phrases with 128 patterns • Eg. “X for Y", “Y with X", “X in the Y", “Y on X“ • Query and record the number of hits • Measure by cosine • Thesaurus paths (WordNet) • degree of similarity between paths
4 Analogies: combine results • Lexical relation modules • a set of more specific modules using the WordNet • 9 modules: Each checks a relationship • Synonym, Antonym, Hypernym, Hyponym, Meronym:substance, Meronym:part, Meronym:member, Holonym:substance, Holonym:member. • Check the stem first, then the choices • Similarity • Make use of definition • Similarity:dict uses dictionary.com and • Similarity:wordsmyth uses wordsmyth.net • Given A:B::C:D, similarity = sim (A, C) + sim (B, D)
5 Conclusion • applied three trained merging rules to TOEFL questions • Accuracy: 97.5% • provided first results on a challenging analogy task with a set of novel modules that use both lexical databases and statistical information. • Accuracy: 45% • the popular mixture rule was consistently weaker than the logarithmic and product rules at assigning high probabilities to correct answers.
2005 Corpus-based Learning of Analogies and Semantic Relations IJCAI 2005 Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30-August 5, 2005.
1 Introduction • Verbal analogy: VSM • A:B :: C:D • The novelty of the paper is the application of VSM to measure the similarity between relationships. • Noun-modifier pairs relations: supervised nearest neighbour algorithm • Dataset: Nastase and Szpakowicz (2003), 600 none-modifier pairs.
1 Introduction: examples • Analogy • Noun-modifier pairs relations • Laser printer • Relation: instrument
2 Solving Analogy Problems • assign scores to candidate analogies A:B::C:D • For multiple-choice questions, guess highest scoring choice • Sim(R1, R2) • difficulty is that R1 and R2 are implicit • attempt to learn R1 and R2 using unsupervised learning from a very large corpus
2 Solving Analogy Problems: Vector Space Model • create vectors, r1 and r2, that represent features of R1 and R2 • measure the similarity of R1 and R2 by the cosine of the angle θ between r1 and r2
2 Solving Analogy Problems:简易图解版 • Generate vector for each word pair Joining terms: “X for Y", “Y with X", “X in the Y", “Y on X“ vector [ log(hit1), log(hit2)…, log(hit128) ] 64 joining terms search phrases hits log Word Pair A:B vector
3 Noun-Modifier Semantic Relations • First attempt to classify semantic relations without a lexicon.
3 Noun-Modifier Semantic Relations: algorithm • nearest neighbour supervised learning • nearest neighbour = cosine • Cosine (training pair, testing pair) • vector of 128 elements, same joining terms as before
3 Noun-Modifier Semantic Relations:Experiment for the 30 Classes
30 Semantic Relations • F when precision and recall are balanced • 26.5% • F for random guessing • 3.3% • much better than random guessing • but still much room for improvement • 30 classes is hard • too many possibilities for confusing classes • try 5 classes instead • group classes together
5 Semantic Relations • F when precision and recall are balanced • 43.2% • F for random guessing • 20.0% • better than random guessing • better than 30 classes • 26.5% • but still room for improvement
Execution Time • experiments presented here required 76,800 queries to AltaVista • 600 word pairs • × 128 queries per word pair • = 76,800 queries • as courtesy to AltaVista, inserted a five second delay between each query • processing 76,800 queries took about five days
Conclusion • The cosine metric in the VSM used to • Analogy • Classify semantic relations • It performs much better than random guessing, but below human levels.
2006aSimilarity of Semantic Relations Computational Linguistics, 32(3):379–416.
1 Introduction • Latent Relational Analysis (LRA) • LRA extends the VSM approach of Turney and Littman (2005) in three ways: • The connecting patterns are derived automatically from the corpus, instead of using a fixed set of patterns. • Singular Value Decomposition (SVD) is used to smooth the frequency data. • automatically generated synonyms are used to explore variations of the word pairs.