1 / 11

Chapter 8

Chapter 8. Lexical Acquisition February 19, 2007 Additional Notes to Manning’s slides. Slide 2 notes. Language is constantly evolving NLP properties of interest is not available in dictionary form – for instance frequency or probability of occurrence

jamuna
Download Presentation

Chapter 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8 Lexical Acquisition February 19, 2007 Additional Notes to Manning’s slides

  2. Slide 2 notes • Language is constantly evolving • NLP properties of interest is not available in dictionary form – for instance frequency or probability of occurrence • Need to constantly learn and acquire new terms and usage • Focus areas for this chapter • Attachment Ambiguity • A. The children ate the cake with their hands. • B. The children ate the cake with blue icing. • Semantic characterization of a verb’s argument

  3. Slide 3 notes • Evaluation measures discussion • tp  true positives • fp  false positives – Type II errors • fn  false negatives – Type I errors • tn  true negatives

  4. Slide 5 notes • Trade-off exist between precision and recall • One can simply return all possible documents and get 100% recall (no false negatives) • But precision will be low as there will be a lot of false positives

  5. Slide 9 - Notes

  6. Slide 11 - notes • tell – has a subcategorization frame NP NP S (subject, object, clause) • find – lacks such a frame. But has NP NP (subject, object)

  7. Slide 12 - notes • Cues for frame: • NP NP • (OBJ|SUBJ_OBJ|CAP)(PUNC|CC) • OBJ  personal pronouns like me and him • SUBJ_OBJ  pronouns such as you, it • CC  subordinating conjunction like if, before or as • The error rate determination uses binomial distribution – each occurrence of the verb is an independent coin flip for which the cue occurs and does not correctly identify the frame (error rate ej) – and (1-ej) where it works correctly.

  8. Slide 13 - notes • Brent (1993) Lerner algorithm has high precision – but low recall. • Manning (1993) By combining it with tagging that will look for patterns such as the following – one can increase the reliability. • (OBJ|SUBJ_OBJ|CAP)(PUNC|CC)

  9. Slide 32 - notes • For instance, the verb “eat” prefers strongly some thing edible as object. • Exceptions include, metaphorical use of the word: • “eating one’s words” or “fear eats the soul”.

  10. Slide 33 - NotesKullback-Leibler Divergence • Relative entropy or KL (Kullback-Leibler) divergence • Example for A(v,n)  noun like “chair” • Susan interrupted the chair.

  11. Slide 38 - notes • X = {1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1} • Y = {1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0} • Matching coefficient = 1 • Dice coefficient = (2 x 1)/(10+10) = 0.1 • Jaccard coefficient = 1/(10 + 10 -1) =~ 0.05 • Overlap coefficient = 1/10 = 0.1 • Cosine coefficient = 1/sqrt of (100) = 0.1 • Cosine is very useful for comparing data with widely varying data set; if one vector with one non-zero entry and another with 1000 non-zero entries, • Dice will give =~ 0.002, Cosine =~ 0.03

More Related