1 / 14

Lecture 24 Distributiona l based Similarity II

Lecture 24 Distributiona l based Similarity II. CSCE 771 Natural Language Processing. Topics Distributional based word similarity Readings: NLTK book Chapter 2 ( wordnet ) Text Chapter 20. April 10, 2013. Overview. Last Time (Programming) Examples of thesaurus based word similarity

oliver-gill
Download Presentation

Lecture 24 Distributiona l based Similarity II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 24Distributional based Similarity II CSCE 771 Natural Language Processing • Topics • Distributional based word similarity • Readings: • NLTK book Chapter 2 (wordnet) • Text Chapter 20 April 10, 2013

  2. Overview • Last Time (Programming) • Examples of thesaurus based word similarity • path-similarity – memory fault ; sim-path(c1,c2) = -log pathlen(c1,c2)nick, Lin • extended Lesk – glosses of words need to include hypernyms • Today • Distributional methods • Readings: • Text 19,20 • NLTK Book: Chapter 10 • Next Time: Distributional based Similarity II

  3. Figure 20.8 Summary of Thesaurus Similarity measures • Elderly moment IS-A memory fault IS-A mistake • sim-path correct in table

  4. Example computing PPMI • Need counts so lets make up some • we need to edit this table to have counts

  5. Associations • PMI-assoc • assocPMI(w, f) = log2 P(w,f) / P(w) P(f) • Lin- assoc - f composed of r (relation) and w’ • assocLIN(w, f) = log2 P(w,f) / P(r|w) P(w’|w) • t-test_assoc (20.41)

  6. Figure 20.10 Co-occurrence vectors • Dependency based parser – special case of shallow parsing • identify from “I discovered dried tangerines.” (20.32) • discover(subject I) I(subject-of discover) • tangerine(obj-of discover) tangerine(adj-mod dried)

  7. Figure 20.11 Objects of the verb drink Hindle 1990

  8. vectors review • dot-product • length • sim-cosine

  9. Figure 20.12 Similarity of Vectors

  10. Fig 20.13 Vector Similarity Summary

  11. Figure 20.14 Hand-built patterns for hypernyms Hearst 1992

  12. Figure 20.15

  13. Figure 20.16

  14. http://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltk • NLTK 3.0a1 released : February 2013 • This version adds support for NLTK’s graphical user interfaces. http://nltk.org/nltk3-alpha/ • which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words? • I want use a function for word clustering and yarowskyalgorightm for find similar collocation in a large text. • http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Linguistics • http://en.wikipedia.org/wiki/Portal:Linguistics • http://en.wikipedia.org/wiki/Yarowsky_algorithm • http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html

More Related