1 / 8

WORDS Lab

WORDS Lab. CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005. Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html. Words, Words, Words.

Download Presentation

WORDS Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WORDS Lab CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  2. Words, Words, Words • So far we have covered methods that largely operate on tokens. • Tokenizing text • Stemming words and determining lemmas • POS-tagging • Language models based on n-gram frequencies CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  3. Every time I fire a linguist, my performance goes up1 • None of this has much of what could be considered "linguistic" knowledge or "understanding". • No parsing • Not much domain knowledge o "meaning" • For the next two sections of the course we will talk extensively about syntax and semantics. 1. Hirschberg, Julia. 1998. "Every time I fire a linguist, my performance goes up," and other myths of the statistical natural language processing revolution. Invited talk, Fifteenth National Conference on Artificial Intelligence (AAAI-98). CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  4. What's In a Word? • For this lab, we will focus on some of the things that can be done with application of the techniques we have already studied. • Format will be • Try a demo • Discuss what techniques were needed to implement it • Discuss some of what would be needed to improve it CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  5. Gender Genie • www.bookblog.net/gender/genie.html • Techniques: • How good is it? What might improve it? • Reference: • www.cs.biu.ac.il/~koppel/papers/male-female-text-final.pdf CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  6. Pearson Knowledge TechnologiesText Classification Demo • www.k-a-t.com:8080/classify/ • Techniques: • How good is it? What might improve it? • Reference: www.k-a-t.com/publications.shtml CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  7. Google Sets • labs.google.com/sets • Techniques: • How good is it? What might improve it? • Reference: if you find one let me know. Possibly something like this: ww.arxiv.org/pdf/cs.CL/0412098 CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

  8. AT&T Text to Speech • www.research.att.com/projects/tts/demo.html • Techniques: • How good is it? What might improve it? • Reference: www.research.att.com/projects/tts/pubs.html CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari

More Related