80 likes | 270 Views
WORDS Lab. CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005. Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html. Words, Words, Words.
E N D
WORDS Lab CSC 9010: Special Topics. Natural Language Processing. Paula Matuszek, Mary-Angela Papalaskari Spring, 2005 Examples taken from the Bird, Klein and Loper: NLTK Tutorial, Tagging, nltk.sourceforge.net/tutorial/tagging/index.html CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
Words, Words, Words • So far we have covered methods that largely operate on tokens. • Tokenizing text • Stemming words and determining lemmas • POS-tagging • Language models based on n-gram frequencies CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
Every time I fire a linguist, my performance goes up1 • None of this has much of what could be considered "linguistic" knowledge or "understanding". • No parsing • Not much domain knowledge o "meaning" • For the next two sections of the course we will talk extensively about syntax and semantics. 1. Hirschberg, Julia. 1998. "Every time I fire a linguist, my performance goes up," and other myths of the statistical natural language processing revolution. Invited talk, Fifteenth National Conference on Artificial Intelligence (AAAI-98). CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
What's In a Word? • For this lab, we will focus on some of the things that can be done with application of the techniques we have already studied. • Format will be • Try a demo • Discuss what techniques were needed to implement it • Discuss some of what would be needed to improve it CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
Gender Genie • www.bookblog.net/gender/genie.html • Techniques: • How good is it? What might improve it? • Reference: • www.cs.biu.ac.il/~koppel/papers/male-female-text-final.pdf CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
Pearson Knowledge TechnologiesText Classification Demo • www.k-a-t.com:8080/classify/ • Techniques: • How good is it? What might improve it? • Reference: www.k-a-t.com/publications.shtml CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
Google Sets • labs.google.com/sets • Techniques: • How good is it? What might improve it? • Reference: if you find one let me know. Possibly something like this: ww.arxiv.org/pdf/cs.CL/0412098 CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari
AT&T Text to Speech • www.research.att.com/projects/tts/demo.html • Techniques: • How good is it? What might improve it? • Reference: www.research.att.com/projects/tts/pubs.html CSC 9010: Special Topics, Natural Language Processing. Spring, 2005. Matuszek & Papalaskari