140 likes | 252 Views
Lecture 24 Distributiona l based Similarity II. CSCE 771 Natural Language Processing. Topics Distributional based word similarity Readings: NLTK book Chapter 2 ( wordnet ) Text Chapter 20. April 10, 2013. Overview. Last Time (Programming) Examples of thesaurus based word similarity
E N D
Lecture 24Distributional based Similarity II CSCE 771 Natural Language Processing • Topics • Distributional based word similarity • Readings: • NLTK book Chapter 2 (wordnet) • Text Chapter 20 April 10, 2013
Overview • Last Time (Programming) • Examples of thesaurus based word similarity • path-similarity – memory fault ; sim-path(c1,c2) = -log pathlen(c1,c2)nick, Lin • extended Lesk – glosses of words need to include hypernyms • Today • Distributional methods • Readings: • Text 19,20 • NLTK Book: Chapter 10 • Next Time: Distributional based Similarity II
Figure 20.8 Summary of Thesaurus Similarity measures • Elderly moment IS-A memory fault IS-A mistake • sim-path correct in table
Example computing PPMI • Need counts so lets make up some • we need to edit this table to have counts
Associations • PMI-assoc • assocPMI(w, f) = log2 P(w,f) / P(w) P(f) • Lin- assoc - f composed of r (relation) and w’ • assocLIN(w, f) = log2 P(w,f) / P(r|w) P(w’|w) • t-test_assoc (20.41)
Figure 20.10 Co-occurrence vectors • Dependency based parser – special case of shallow parsing • identify from “I discovered dried tangerines.” (20.32) • discover(subject I) I(subject-of discover) • tangerine(obj-of discover) tangerine(adj-mod dried)
vectors review • dot-product • length • sim-cosine
http://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltk • NLTK 3.0a1 released : February 2013 • This version adds support for NLTK’s graphical user interfaces. http://nltk.org/nltk3-alpha/ • which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words? • I want use a function for word clustering and yarowskyalgorightm for find similar collocation in a large text. • http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Linguistics • http://en.wikipedia.org/wiki/Portal:Linguistics • http://en.wikipedia.org/wiki/Yarowsky_algorithm • http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html