170 likes | 191 Views
Lecture 13. Corpus Linguistics I. From Knowledge-Based to Corpus-Based Linguistics. A Paradigm Shift begins in the 1980s Seeds planted in the 1950s (Harris, Firth) Cut off by Chomsky Renewal due to Interest in practical applications (ASR, MT, …)
E N D
Lecture 13 Corpus Linguistics I CS 4705
From Knowledge-Based to Corpus-Based Linguistics • A Paradigm Shift begins in the 1980s • Seeds planted in the 1950s (Harris, Firth) • Cut off by Chomsky • Renewal due to • Interest in practical applications (ASR, MT, …) • Availability at major industrial labs of powerful machines and large amounts of storage • Increasing availability of large online texts and speech data • Crossover efforts with ASR community, fostered by DARPA
For many practical tasks, statistical methods perform better • Less knowledge required by researchers
Next Word Prediction • An ostensibly artificial task: predicting the next word in a sequence. • From a NY Times story... • Stocks plunged this …. • Stocks plunged this morning, despite a cut in interest rates • Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall ... • Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall Street began
Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall Street began trading for the first time since last … • Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall Street began trading for the first time since last Tuesday's terrorist attacks.
Human Word Prediction • Clearly, at least some of us have the ability to predict future words in an utterance. • How? • Domain knowledge • Syntactic knowledge • Lexical knowledge
Claim • A useful part of the knowledge needed to allow Word Prediction (guessing the next word) can be captured using simple statistical techniques. • In particular, we'll rely on the notion of the probability of a sequence (e.g., sentence) and the likelihood of words co-occurring
Why would we want to do this? • Why would anyone want to predict a word? • If you say you can predict the next word, it means you can rank the likelihood of sequences containing various alternative words, or, alternative hypotheses • You can assess the likelihood/goodness of an hypothesis
Many NLP problems can be modeled as mapping from one string of symbols to another. • In statistical language applications, knowledge of the source (e.g, a statistical model of word sequences) is referred to as a Language Model or a Grammar
Why is this useful? • Example applications that employ language models: • Speech recognition • Handwriting recognition • Spelling correction • Machine translation systems • Optical character recognizers
Real Word Spelling Errors • They are leaving in about fifteen minuets to go to her house. • The study was conducted mainly be John Black. • The design an construction of the system will take more than a year. • Hopefully, all with continue smoothly in my absence. • Can they lave him my messages? • I need to notified the bank of…. • He is trying to fine out.
Handwriting Recognition • Assume a note is given to a bank teller, which the teller reads as I have a gub. (cf. Woody Allen) • NLP to the rescue …. • gub is not a word • gun, gum, Gus, and gull are words, but gun has a higher probability in the context of a bank
For Spell Checkers • Collect a list of commonly substituted words • piece/peace, whether/weather, their/there ... • Whenever you encounter one of these words in a sentence, construct the alternative sentence as well • Assess the goodness of each and choose the one (word) with the more likely sentence • E.g. • On Tuesday, the whether • On Tuesday, the weather
The Noisy Channel Model • A probabilistic model developed by Claude Shannon to model communication (as over a phone line) Noisy Channel O = argmaxPr(I|O) = argmaxPr(I) Pr(O|I) I I • the most likely input • Pr(I) the prior probability • Pr(I|O) the most likely I given O • Pr(O|I) the probability that O is the output if I is the input
Review: Basic Probability • Prior Probability (or unconditional probability) • P(A), where A is some event • Possible events: it raining, the next person you see being Scandinavian, a child getting the measles, the word ‘warlord’ occurring in the newspaper • Conditional Probability • P(A | B) • the probability of A, given that we know B • E.g. it raining, given that we know it’s October; the next person you see being Scandinavian, given that you’re in Sweden, the word ‘warlord’ occurring in a story about Afghanistan
Example F F F FF FI I I I • P(Finn) = .6 • P(skier) = .5 • P(skier|Finn) = .67 • P(Finn|skier) = .8
Next class • Midterm • Next class: • Hindle & Rooth 1993 • Begin studying semantics, Ch. 14