Linguistically Rich Statistical Models of Language

Linguistically Rich Statistical Models of Language Joseph Smarr M.S. Candidate Symbolic Systems Program Advisor: Christopher D. Manning December 5th, 2002

Grand Vision • Talk to your computer like another human • HAL, Star Trek, etc. • Ask your computer a question, it finds the answer • “Who’s speaking at this week’s SymSys Forum?” • Computer can read and summarize text for you • “What’s the cutting edge in NLP these days?”

We’re Not There (Yet) • Turns out behaving intelligently is difficult • What does it take to achieve the grand vision? • General Artificial Intelligence problems • Knowledge representation, common sense reasoning, etc. • Language-specific problems • Complexity, ambiguity, and flexibility of language • Always underestimated because language is so easy for us!

Are There Useful Sub-Goals? • Grand vision is still too hard, but we can solve simpler problems that are still valuable • Filter news for stories about new tech gadgets • Take the SSP talk email and add it to my calendar • Dial my cell phone by speaking my friend’s name • Automatically reply to customer service e-mails • Find out which episode of The Simpsons is tonight • Two approaches to understanding language: • Theory-driven: Theoretical Linguistics • Task-driven: Natural Language Processing

Theoretical Linguistics vs. NLP Theoretical Linguistics • Goal: • Understand people’s Knowledge of language • Method: • Rich logical representations of language’s hidden structure and meaning • Guiding principles: • Separation of (hidden) knowledge of language and (observable) performance • Grammaticality is categorical (all or none) • Describe what are possible and impossible utterances Natural Language Processing • Goal: • Develop practical tools for analyzing speech / text • Method: • Simple, robust models of everyday language use that are sufficient to perform tasks • Guiding principles • Exploit (empirical) regularities and patterns in examples of language in text collections • Sentence “goodness” is gradient (better or worse) • Deal with the utterances you’re given, good or bad

Theoretical Linguistics vs. NLP Linguistics NLP

Linguistic Puzzle • When dropping an argument, why do some verbs keep the subject and some keep the object? • John sang the song John sang • John broke the vaseThe vase broke • Not just “quirkiness of language” • Similar patterns show up in other languages • Seems to involve deep aspects of verb meaning • Rules to account for this phenomenon • Two classes of verbs (unergative & unaccusative) • Remaining argument must be realized as subject

Exception: Imperatives • “Open the pod bay doors, Hal” • Different goals lead to study of different problems. In NLP... • Need to recognize this as a command • Need to figure out what specific action to take • Irrelevant how you’d say it in French • Describing language vs. working with language • But both tasks clearly share many sub-problems

Theoretical Linguistics vs. NLP • Potential for much synergy between linguistics and NLP • However, historically they have remained quite distinct • Chomsky (founder of generative grammar): • “It must be recognized that the notion ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term.” • Karttunen (founder of finite state technologies at Xerox) • Linguists’ reaction to NLP: “Not interested. You do not understand Theory. Go away you geek.” • Jelinek (former head of IBM speech project): • “Every time I fire a linguist, the performance of our speech recognition system goes up.”

Potential Synergies • Lexical acquisition (unknown words) • Statistically infer new lexical entries from context • Modeling “naturalness” and “conventionality” • Use corpus data to weight constructions • Dealing with ungrammatical utterances • Find “most similar / most likely” correction • Richer patterns for finding information in text • Use argument structure / semantic dependencies • More powerful models for speech recognition • Progressively build parse tree while listening

Finding Information in Text • US Government has sponsored lots of research in “information extraction” from news articles • Find mentions of terrorists and which locations they’re targeting • Find which companies are being acquired by which others and for how much • Progress driven by simplifying the models used • Early work used rich linguistic parsers • Unable to robustly handle natural text • Modern work is mainly finite state patterns • Regular expressions are very practical and successful

Our Price:$##.## Web Information Extraction • How much does that text book cost on Amazon? • Learn patterns for finding relevant fields

Example of Tagged Sentence: Uba2pis located largelyinthe nucleus. NP_SEG VP_SEG PP_SEG NP_SEG Improving IE Performance on Natural Text Documents • How can we scale IE back up for natural text? • Need to look elsewhere for regularities to exploit • Idea: Consider grammatical structure • Run shallow parser on each sentence • Flatten output into sequence of “typed chunks”

Power of Linguistic Features 21% increase 65% increase 45% increase

Linguistically Rich(er) IE • Exploit more grammatical structure for patterns • e.g. Tim Grow’s work on IE with PCFGs S{pur, acq, amt} VP{acq, amt} VP{acq, amt} NP{pur} MD NNP NNP NNP PP{amt} will VB {pur} {pur} {pur} NP{acq} NP{amt} acquire IN First Union Corp NNP NNP NNP CD CD NNP for {acq} {acq} {acq} {amt} {amt} {amt} Sheland Bank Inc three million dollars

Cotrimoxazole Wethersfield Alien Fury: Countdown to Invasion Classifying Unknown Words • Which of the following is the name of a city? • Most linguistic grammars assume a fixed lexicon • How do humans learn to deal with new words? • Context (“I spent a summer living in Wethersfield”) • Makeup of the word itself (“phonesthetics”) • Idea: Learn distinguishing letter sequences

oxa : field What’s in a Name?

pairwise 1-all n-way Experimental Results

Knowledge of Frequencies • Linguistics traditionally assumes Knowledge of Language doesn’t involve counting • Letter frequencies are clearly an important source of knowledge for unknown words • Similarly, we saw before that there are regular patterns to exploit in grammatical information Take home point: • Combining Statistical NLP methods with richer linguistic representations is a big win!

Language is Ambiguous! • Ban on Nude Dancing on Governor’s Desk – from a Georgia newspaper column discussing current legislation • Lebanese chief limits access to private parts – talking about an Army General’s initiative • Death may ease tension – an article about the death of Colonel Jean-Claude Paul in Haiti • Iraqi Head Seeks Arms • Juvenile Court to Try Shooting Defendant • Teacher Strikes Idle Kids • Stolen Painting Found By Tree

Language is Ambiguous! • Local HS Dropouts Cut in Half • Obesity Study Looks for Larger Test Group • British Left Waffles on Falkland Islands • Red Tape Holds Up New Bridges • Man Struck by Lightning Faces Battery Charge • Clinton Wins on Budget, but More Lies Ahead • Hospitals Are Sued by 7 Foot Doctors • Kids Make Nutritious Snacks

Coping With Ambiguity • Categorical grammars like HPSG provide many possible analyses for sentences • 455 parses for “List the sales of the products produced in 1973 with the products produced in 1972.” (Martin et al, 1987) • In most cases, only one interpretation is intended • Initial solution was hand-coded preferences among rules • Hard to manage as number of rules increase • Need to capture interactions among rules

Statistical HPSG Parse Selection • HPSG provides deep analyses of sentence structure and meaning • Useful for NLP tasks like question answering • Need to solve disambiguation problem to make using these richer representations practical • Idea: Learn statistical preferences among constructions from hand-disambiguated collection of sentences • Result: Correct analysis chosen >80% of the time StatNLP methods + Linguistic representation = Win

Towards Semantic Extraction • HPSG provides representation of meaning • Who did what to whom? • Computers need meaning to do inference • Can we extend information extraction methods to extract meaning representations from pages? • Current project: IE for the semantic web • Large project to build rich ontologies to describe the content of web pages for intelligent agents • Use IE to extract new instances of concepts from web pages (as opposed to manual labeling) • student(Joseph), univ(Stanford), at(Joseph, Stanford)

Towards the Grand Vision? • Collaboration between Theoretical Linguistics and NLP is important step forward • Practical tools with sophisticated language power • How can we ever teach computers enough about language and the world? • Hawking: Moore’s Law is sufficient • Moravec: mobile robots must learn like children • Kurzweil: reverse-engineer the human brain • The experts agree: Symbolic Systems is the future!

Upcoming Convergence Courses • Ling 139M Machine Translation Win • Ling 239E Grammar Engineering Win • CS 276B Text Information Retrieval Win • Ling 239A Parsing and Generation Spr • CS 224N Natural Language Processing Spr Get Involved!!

Linguistically Rich Statistical Models of Language

Linguistically Rich Statistical Models of Language

Presentation Transcript

Statistical Forecasting Models

Connectionist Models of Language

Linear Statistical Models

Statistical Inventory Models

Formal Models of Language

Building Statistical Models

III.4 Statistical Language Models

Models of Language

Language Models

Language Models

Language Models

Classifying Reading Levels with Statistical Language Models

Statistical Language Modelling Part I – Observable Models

Statistical Models of Human Performance

Language Models

Building of statistical models

Statistical Mechanics of Sloppy Models

Statistical Shape Models

Classifying Reading Levels with Statistical Language Models

Statistical / empirical models

Statistical Models of Human Performance