270 likes | 389 Views
Linguistically Rich Statistical Models of Language. Joseph Smarr M.S. Candidate Symbolic Systems Program Advisor: Christopher D. Manning December 5 th , 2002. Grand Vision. Talk to your computer like another human HAL, Star Trek, etc. Ask your computer a question, it finds the answer
E N D
Linguistically Rich Statistical Models of Language Joseph Smarr M.S. Candidate Symbolic Systems Program Advisor: Christopher D. Manning December 5th, 2002
Grand Vision • Talk to your computer like another human • HAL, Star Trek, etc. • Ask your computer a question, it finds the answer • “Who’s speaking at this week’s SymSys Forum?” • Computer can read and summarize text for you • “What’s the cutting edge in NLP these days?”
We’re Not There (Yet) • Turns out behaving intelligently is difficult • What does it take to achieve the grand vision? • General Artificial Intelligence problems • Knowledge representation, common sense reasoning, etc. • Language-specific problems • Complexity, ambiguity, and flexibility of language • Always underestimated because language is so easy for us!
Are There Useful Sub-Goals? • Grand vision is still too hard, but we can solve simpler problems that are still valuable • Filter news for stories about new tech gadgets • Take the SSP talk email and add it to my calendar • Dial my cell phone by speaking my friend’s name • Automatically reply to customer service e-mails • Find out which episode of The Simpsons is tonight • Two approaches to understanding language: • Theory-driven: Theoretical Linguistics • Task-driven: Natural Language Processing
Theoretical Linguistics vs. NLP Theoretical Linguistics • Goal: • Understand people’s Knowledge of language • Method: • Rich logical representations of language’s hidden structure and meaning • Guiding principles: • Separation of (hidden) knowledge of language and (observable) performance • Grammaticality is categorical (all or none) • Describe what are possible and impossible utterances Natural Language Processing • Goal: • Develop practical tools for analyzing speech / text • Method: • Simple, robust models of everyday language use that are sufficient to perform tasks • Guiding principles • Exploit (empirical) regularities and patterns in examples of language in text collections • Sentence “goodness” is gradient (better or worse) • Deal with the utterances you’re given, good or bad
Theoretical Linguistics vs. NLP Linguistics NLP
Linguistic Puzzle • When dropping an argument, why do some verbs keep the subject and some keep the object? • John sang the song John sang • John broke the vaseThe vase broke • Not just “quirkiness of language” • Similar patterns show up in other languages • Seems to involve deep aspects of verb meaning • Rules to account for this phenomenon • Two classes of verbs (unergative & unaccusative) • Remaining argument must be realized as subject
Exception: Imperatives • “Open the pod bay doors, Hal” • Different goals lead to study of different problems. In NLP... • Need to recognize this as a command • Need to figure out what specific action to take • Irrelevant how you’d say it in French • Describing language vs. working with language • But both tasks clearly share many sub-problems
Theoretical Linguistics vs. NLP • Potential for much synergy between linguistics and NLP • However, historically they have remained quite distinct • Chomsky (founder of generative grammar): • “It must be recognized that the notion ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term.” • Karttunen (founder of finite state technologies at Xerox) • Linguists’ reaction to NLP: “Not interested. You do not understand Theory. Go away you geek.” • Jelinek (former head of IBM speech project): • “Every time I fire a linguist, the performance of our speech recognition system goes up.”
Potential Synergies • Lexical acquisition (unknown words) • Statistically infer new lexical entries from context • Modeling “naturalness” and “conventionality” • Use corpus data to weight constructions • Dealing with ungrammatical utterances • Find “most similar / most likely” correction • Richer patterns for finding information in text • Use argument structure / semantic dependencies • More powerful models for speech recognition • Progressively build parse tree while listening
Finding Information in Text • US Government has sponsored lots of research in “information extraction” from news articles • Find mentions of terrorists and which locations they’re targeting • Find which companies are being acquired by which others and for how much • Progress driven by simplifying the models used • Early work used rich linguistic parsers • Unable to robustly handle natural text • Modern work is mainly finite state patterns • Regular expressions are very practical and successful
Our Price:$##.## Web Information Extraction • How much does that text book cost on Amazon? • Learn patterns for finding relevant fields
Example of Tagged Sentence: Uba2pis located largelyinthe nucleus. NP_SEG VP_SEG PP_SEG NP_SEG Improving IE Performance on Natural Text Documents • How can we scale IE back up for natural text? • Need to look elsewhere for regularities to exploit • Idea: Consider grammatical structure • Run shallow parser on each sentence • Flatten output into sequence of “typed chunks”
Power of Linguistic Features 21% increase 65% increase 45% increase
Linguistically Rich(er) IE • Exploit more grammatical structure for patterns • e.g. Tim Grow’s work on IE with PCFGs S{pur, acq, amt} VP{acq, amt} VP{acq, amt} NP{pur} MD NNP NNP NNP PP{amt} will VB {pur} {pur} {pur} NP{acq} NP{amt} acquire IN First Union Corp NNP NNP NNP CD CD NNP for {acq} {acq} {acq} {amt} {amt} {amt} Sheland Bank Inc three million dollars
Cotrimoxazole Wethersfield Alien Fury: Countdown to Invasion Classifying Unknown Words • Which of the following is the name of a city? • Most linguistic grammars assume a fixed lexicon • How do humans learn to deal with new words? • Context (“I spent a summer living in Wethersfield”) • Makeup of the word itself (“phonesthetics”) • Idea: Learn distinguishing letter sequences
oxa : field What’s in a Name?
Generative Model of PNPs Length n-gram model and word model P(pnp|c) = Pn-gram(word-lengths(pnp)) *Pword ipnp P(wi|word-length(wi)) Word model: mixture of character n-gram model and common word model P(wi|len) = llen*Pn-gram(wi|len)k/len + (1-llen)* Pword(wi|len) N-Gram Models: deleted interpolation P0-gram(symbol|history) = uniform-distribution Pn-gram(s|h) = lC(h)Pempirical(s|h) + (1- lC(h))P(n-1)-gram(s|h)
pairwise 1-all n-way Experimental Results
Knowledge of Frequencies • Linguistics traditionally assumes Knowledge of Language doesn’t involve counting • Letter frequencies are clearly an important source of knowledge for unknown words • Similarly, we saw before that there are regular patterns to exploit in grammatical information Take home point: • Combining Statistical NLP methods with richer linguistic representations is a big win!
Language is Ambiguous! • Ban on Nude Dancing on Governor’s Desk – from a Georgia newspaper column discussing current legislation • Lebanese chief limits access to private parts – talking about an Army General’s initiative • Death may ease tension – an article about the death of Colonel Jean-Claude Paul in Haiti • Iraqi Head Seeks Arms • Juvenile Court to Try Shooting Defendant • Teacher Strikes Idle Kids • Stolen Painting Found By Tree
Language is Ambiguous! • Local HS Dropouts Cut in Half • Obesity Study Looks for Larger Test Group • British Left Waffles on Falkland Islands • Red Tape Holds Up New Bridges • Man Struck by Lightning Faces Battery Charge • Clinton Wins on Budget, but More Lies Ahead • Hospitals Are Sued by 7 Foot Doctors • Kids Make Nutritious Snacks
Coping With Ambiguity • Categorical grammars like HPSG provide many possible analyses for sentences • 455 parses for “List the sales of the products produced in 1973 with the products produced in 1972.” (Martin et al, 1987) • In most cases, only one interpretation is intended • Initial solution was hand-coded preferences among rules • Hard to manage as number of rules increase • Need to capture interactions among rules
Statistical HPSG Parse Selection • HPSG provides deep analyses of sentence structure and meaning • Useful for NLP tasks like question answering • Need to solve disambiguation problem to make using these richer representations practical • Idea: Learn statistical preferences among constructions from hand-disambiguated collection of sentences • Result: Correct analysis chosen >80% of the time StatNLP methods + Linguistic representation = Win
Towards Semantic Extraction • HPSG provides representation of meaning • Who did what to whom? • Computers need meaning to do inference • Can we extend information extraction methods to extract meaning representations from pages? • Current project: IE for the semantic web • Large project to build rich ontologies to describe the content of web pages for intelligent agents • Use IE to extract new instances of concepts from web pages (as opposed to manual labeling) • student(Joseph), univ(Stanford), at(Joseph, Stanford)
Towards the Grand Vision? • Collaboration between Theoretical Linguistics and NLP is important step forward • Practical tools with sophisticated language power • How can we ever teach computers enough about language and the world? • Hawking: Moore’s Law is sufficient • Moravec: mobile robots must learn like children • Kurzweil: reverse-engineer the human brain • The experts agree: Symbolic Systems is the future!
Upcoming Convergence Courses • Ling 139M Machine Translation Win • Ling 239E Grammar Engineering Win • CS 276B Text Information Retrieval Win • Ling 239A Parsing and Generation Spr • CS 224N Natural Language Processing Spr Get Involved!!