1 / 67

It is the best of times (and the worst of times)

It is the best of times (and the worst of times). Kenneth Church Microsoft church@microsoft.com. Responsibility; Attribute Dangerous Positions to Others. Interesting & Controversial. Wow! (What a difference a decade makes). Lonely. Preaching to Choir. Empiricism has come of age

tavita
Download Presentation

It is the best of times (and the worst of times)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. It is the best of times(and the worst of times) Kenneth Church Microsoft church@microsoft.com

  2. Responsibility; Attribute Dangerous Positions to Others Interesting & Controversial Wow!(What a difference a decade makes) Lonely Preaching to Choir • Empiricism has come of age • Radical Fringe  Mainstream • 1993: Workshop on Very Large Corpora (WVLC) • Intended to be a 1-time event • But so successful that it evolved into a series of EMNLP conferences • EMNLP-2004 received so many submissions that the program committee had to be expanded at the last minute • Success/Catastrophe EMNLP-2004 & Senseval-2004

  3. The Structure of Scientific Revolutions (1962) – Kuhn (p.10) • Paradigms • Examples from Physics • Aristotle’s Physica • Ptolemy’s Almagest • Newton’s Principia and Optics • Franklin’s Electricity • Lavoisier’s Chemistry • Lyell’s Geology • Two characteristics: • Sufficiently unprecedented to attract an enduring group of adherents from competing modes of scientific activity • Simultaneously, sufficiently open-ended to leave all sorts of problems for the redefined group of practitioners to resolve EMNLP-2004 & Senseval-2004

  4. Organizational Innovations(Radical  Mainstream) • Late Submission Deadline • Immediately after ACL notifications • ACL was rejecting good papers for bad reasons • Short review cycles  Freshness • Invest in the Future: Encourage Innovation • Chair (Energetic, Promising, Source of new ideas) • Co-chair (Established, Knows how it has been done) • Avoid incremental papers • Reviewers prefer boring papers over radical ones • Reviewers do what reviewers do; chairs  correction • Inclusiveness: Diversity  Growth (Sales) • Thankless chores  Marketing carrots • 1/3 promising, 1/3 stability, 1/3 outreach • Hold conferences in Europe, Asia & America Innovation Checks & Balances Short term ≠ Long term EMNLP-2004 & Senseval-2004

  5. What Worked and What Didn’t? Data • Stay on msg: It is data, stupid! • WVLC (Very Large) >> EMNLP (Empirical Methods) • If you have a lot of data, • Then you don’t need a lot of methodology • Empiricism means diff things to diff people • Machine Learning (Self-organizing Methods) • Exploratory Data Analysis (EDA) • Corpus-Based Lexicography • Lots of papers on 1 • EMNLP-2004 theme (error analysis)  2 • Senseval grew out of 3 Methodology Kucera & Francis gave great invited talk (but they couldn’t follow submitted talks) EMNLP-2004 & Senseval-2004

  6. Bar-Hillel (1960): Abandoned Machine Translation (MT) Couldn’t see how to make progress on WSD (pen) Can’t translate without disambiguating bank (money)  banque bank (river)  banc 1990s Parallel text ≈ Labeled corpus for supervised training and testing Isn’t it great the translators have WSD labeled all this data for us! Yarowsky: Parallel corpus  encyclopedia + thesaurus Bilingual ≠ Monolingual interest wear ML: Co-training Supervised  Unsupervised Lexicography: Hector Joint collaboration: Oxford University Press & DEC flagging  flogging Senseval Word Sense Disambiguation (WSD) History EMNLP-2004 & Senseval-2004

  7. A Road Rarely Taken:Tukey’s Exploratory Data Analysis (EDA) • Linear Regression • Standard practice: • Plug data into off-the-shelf package • Publish (if “significant”) • Better: • Check for outliers • Bowed residuals • Evidence of a positive or negative derivative • Deviations from assumptions (normality) • Fanout • Slocum’s Thesis (1981) • “Proof” that CKY takes linear time No Result Standard texts (e.g., Aho)… consider … worst case… This assumption clearly fails to apply to natural language… Our experiments have shown that average-case time performance… is approximately linear (p. 102) EMNLP-2004 & Senseval-2004

  8. Many Machine Learning (ML) Techniques (SVMs, Perceptrons) are Similar to (Logistic) Regression;Rarely see EDA (Robust Statistical) Methods in ML The Elements of Statistical Learning – Hastie, Tibshirani, Friedman (2001), p 380 EMNLP-2004 & Senseval-2004

  9. Historical Context Empiricists feel lonely Rationalists feel lonely • 1950s: • Rigorous methodology • Information theory • Behaviorism • Unfulfilled unrealistic expectations video • ALPAC report • Whither Speech Recognition? • 1970s: • Let it all hang out • Artificial Intelligence • Cognitive Psychology • 1990s: • Revival of empiricism Kuhn Crisis Kuhn Crisis EMNLP-2004 & Senseval-2004

  10. Borrowed Slide: Jelinek (LREC) “Whither Speech Recognition?”Pierce, JASA 1969 Also, ALPAC (chair) & Bell Labs exec …ASR is attractive to money. The attraction is perhaps similar to the attraction of schemes for turning water into gasoline, extracting gold from the sea, or going to the moon. Most recognizers behave not like scientists, but like mad inventors or untrustworthy engineers. …performance will continue to be very limited unless the recognizing device understands what is being said with something of the facility of a native speaker (that is, better than a foreigner fluent in the language) Any application of the foregoing discussion to work in the general area of pattern recognition is left as an exercise for the reader. EMNLP-2004 & Senseval-2004

  11. ALPAC (1966): the (in)famous reportJohn Hutchins • The best known event in the history of MT is … • Automatic Language Processing Advisory Committee (ALPAC) • Its effect was to bring to an end the substantial funding of MT research in US for some twenty years. • More significantly was the clear message to the general public and the rest of the scientific community that MT was hopeless. • For years afterwards, an interest in MT was something to keep quiet about; it was almost shameful. • To this day, the 'failure' of MT is still repeated by many as an indisputable fact. • The impact of ALPAC is undeniable • While the fame or notoriety of ALPAC is familiar, • What the report actually said is now becoming less familiar and often forgotten or misunderstood… EMNLP-2004 & Senseval-2004

  12. Computational linguistics as part of linguistics Studies of parsing, generation… including experiments in translation… Linguistics should be supported as science, and should not be judged by any immediate or foreseeable contribution to practical translation Improvement of translation: practical methods for evaluation of translations; means for speeding up the human translation process; evaluation of quality and cost of various sources of translations; investigation of the utilization of translations, to guard against production of translations that are never read; study of delays in the over-all translation process, and means for eliminating them, both in journals and in individual items; evaluation of the relative speed and cost of various sorts of machine-aided translation; adaptation of existing mechanized editing and production processes in translation; the over-all translation process; and production of adequate reference works for the translator, including the adaptation of glossaries that now exist primarily for automatic dictionary look-up in machine translation ALPAC RecommendationsThe committee recommends expenditures in two distinct areas Theory Practice EMNLP-2004 & Senseval-2004

  13. Outline Best of Times • We’re making consistent progress, or • We’re running around in circles, or • Don’t worry; be happy • We’re going off a cliff… EMNLP-2004 & Senseval-2004

  14. Where have we been and where are we going?Moore’s Law: Ideal Answer Moores: Bob ≠ Gorden ≠ Roger EMNLP-2004 & Senseval-2004

  15. Borrowed Slide Audrey Le (NIST) Error Rate • Moore’s Law Time Constant: • 10x improvement per decade Date (15 years) EMNLP-2004 & Senseval-2004

  16. Charles Wayne’s Challenge:Demonstrate Consistent Progress Over Time Managing Expectations • Controversial in 1980s • But not in 1990s • Though, grumbling • Benefits • Agreement on what to do • Limits endless discussion • Helps sell the field • Manage expectations • Fund raising • Risks (similar to benefits) • All our eggs are in one basket (lack of diversity) • Not enough discussion • Hard to change course • Methodology  Burden EMNLP-2004 & Senseval-2004

  17. Hockey StickBusiness Case This Year Last Year Next Year EMNLP-2004 & Senseval-2004

  18. Where have we been and where are we going?Consistent Progress over Time Manage Expectations Extrapolation/Prediction is Applicable Extrapolation/Prediction is Not Applicable EMNLP-2004 & Senseval-2004

  19. When will we see the last non-statistical paper? 2010? EMNLP-2004 & Senseval-2004

  20. Top Ten Metrics of Success Search • Value Creation (Reality) • Stock Prices (Belief) • Startup Companies Raise Venture Capital (Excitement) • Prototype Applications (Plausibility) • Grand-Students (Survive the Test of Time) • Students Get Good Jobs • Students Finish PhD Theses • Citations • Conference Registrations • Publications (Quantity) Speech Senseval wants to be here We are here EMNLP-2004 & Senseval-2004

  21. Outline • We’re making consistent progress, or • We’re running around in circles, or • Don’t worry; be happy • We’re going off a cliff… Best of Times (Not!) Been there; Done that EMNLP-2004 & Senseval-2004

  22. It has been claimed thatRecent progress made possible by EmpiricismProgress (or Oscillating Fads)? • 1950s: Empiricism was at its peak • Dominating a broad set of fields • Ranging from psychology (Behaviorism) • To electrical engineering (Information Theory) • Psycholinguistics: Word frequency norms (correlated with reaction time, errors) • Word association norms (priming): bread and butter, doctor / nurse • Linguistics/psycholinguistics: focus on distribution (correlate of meaning) • Firth: “You shall know a word by the company it keeps” • Collocations: Strong tea v. powerful computers • 1970s: Rationalism was at its peak • with Chomsky’s criticism of ngrams in Syntactic Structures (1957) • and Minsky and Papert’s criticism of neural networks in Perceptrons (1969). • 1990s: Revival of Empiricism • Availability of massive amounts of data (popular arg, even before the web) • “More data is better data” • Quantity >> Quality (balance) • Pragmatic focus: • What can we do with all this data? • Better to do something than nothing at all • Empirical methods (and focus on evaluation): Speech  Language • 2010s: Revival of Rationalism (?) EMNLP-2004 & Senseval-2004

  23. It has been claimed thatRecent progress made possible by EmpiricismProgress (or Oscillating Fads)? • 1950s: Empiricism was at its peak • Dominating a broad set of fields • Ranging from psychology (Behaviorism) • To electrical engineering (Information Theory) • Psycholinguistics: Word frequency norms (correlated with reaction time, errors) • Word association norms (priming): bread and butter, doctor / nurse • Linguistics/psycholinguistics: focus on distribution (correlate of meaning) • Firth: “You shall know a word by the company it keeps” • Collocations: Strong tea v. powerful computers • 1970s: Rationalism was at its peak • with Chomsky’s criticism of ngrams in Syntactic Structures (1957) • and Minsky and Papert’s criticism of neural networks in Perceptrons (1969). • 1990s: Revival of Empiricism • Availability of massive amounts of data (popular arg, even before the web) • “More data is better data” • Quantity >> Quality (balance) • Pragmatic focus: • What can we do with all this data? • Better to do something than nothing at all • Empirical methods (and focus on evaluation): Speech  Language • 2010s: Revival of Rationalism (?) EMNLP-2004 & Senseval-2004

  24. It has been claimed thatRecent progress made possible by EmpiricismProgress (or Oscillating Fads)? • 1950s: Empiricism was at its peak • Dominating a broad set of fields • Ranging from psychology (Behaviorism) • To electrical engineering (Information Theory) • Psycholinguistics: Word frequency norms (correlated with reaction time, errors) • Word association norms (priming): bread and butter, doctor / nurse • Linguistics/psycholinguistics: focus on distribution (correlate of meaning) • Firth: “You shall know a word by the company it keeps” • Collocations: Strong tea v. powerful computers • 1970s: Rationalism was at its peak • with Chomsky’s criticism of ngrams in Syntactic Structures (1957) • and Minsky and Papert’s criticism of neural networks in Perceptrons (1969). • 1990s: Revival of Empiricism • Availability of massive amounts of data (popular arg, even before the web) • “More data is better data” • Quantity >> Quality (balance) • Pragmatic focus: • What can we do with all this data? • Better to do something than nothing at all • Empirical methods (and focus on evaluation): Speech  Language • 2010s: Revival of Rationalism (?) EMNLP-2004 & Senseval-2004

  25. It has been claimed thatRecent progress made possible by EmpiricismProgress (or Oscillating Fads)? • Periodic signals are continuous • Support extrapolation/prediction • Progress? Consistent progress? • 1950s: Empiricism was at its peak • Dominating a broad set of fields • Ranging from psychology (Behaviorism) • To electrical engineering (Information Theory) • Psycholinguistics: Word frequency norms (correlated with reaction time, errors) • Word association norms (priming): bread and butter, doctor / nurse • Linguistics/psycholinguistics: focus on distribution (correlate of meaning) • Firth: “You shall know a word by the company it keeps” • Collocations: Strong tea v. powerful computers • 1970s: Rationalism was at its peak • with Chomsky’s criticism of ngrams in Syntactic Structures (1957) • and Minsky and Papert’s criticism of neural networks in Perceptrons (1969). • 1990s: Revival of Empiricism • Availability of massive amounts of data (popular arg, even before the web) • “More data is better data” • Quantity >> Quality (balance) • Pragmatic focus: • What can we do with all this data? • Better to do something than nothing at all • Empirical methods (and focus on evaluation): Speech  Language • 2010s: Revival of Rationalism (?) Consistent progress? Extrapolation/Prediction: Applicable? EMNLP-2004 & Senseval-2004

  26. Speech  LanguageHas the pendulum swung too far? • What happened between TMI-1992 and TMI-2002 (if anything)? • Have empirical methods become too popular? • Has too much happened since TMI-1992? • I worry that the pendulum has swung so far that • We are no longer training students for the possibility • that the pendulum might swing the other way • We ought to be preparing students with a broad education including: • Statistics and Machine Learning • as well as Linguistic Theory • History repeats itself: Mark Twain; bad idea then and still a bad idea now • 1950s: empiricism • 1970s: rationalism (empiricist methodology became too burdensome) • 1990s: empiricism • 2010s: rationalism (empiricist methodology is burdensome, again) EMNLP-2004 & Senseval-2004

  27. Speech  LanguageHas the pendulum swung too far? • What happened between TMI-1992 and TMI-2002 (if anything)? • Have empirical methods become too popular? • Has too much happened since TMI-1992? • I worry that the pendulum has swung so far that • We are no longer training students for the possibility • that the pendulum might swing the other way • We ought to be preparing students with a broad education including: • Statistics and Machine Learning • as well as Linguistic Theory • History repeats itself: Mark Twain; bad idea then and still a bad idea now • 1950s: empiricism • 1970s: rationalism (empiricist methodology became too burdensome) • 1990s: empiricism • 2010s: rationalism (empiricist methodology is burdensome, again) Plays well at Machine Translation conferences EMNLP-2004 & Senseval-2004

  28. Speech  LanguageHas the pendulum swung too far? • What happened between TMI-1992 and TMI-2002 (if anything)? • Have empirical methods become too popular? • Has too much happened since TMI-1992? • I worry that the pendulum has swung so far that • We are no longer training students for the possibility • that the pendulum might swing the other way • We ought to be preparing students with a broad education including: • Statistics and Machine Learning • as well as Linguistic Theory • History repeats itself: Mark Twain; bad idea then and still a bad idea now • 1950s: empiricism • 1970s: rationalism (empiricist methodology became too burdensome) • 1990s: empiricism • 2010s: rationalism (empiricist methodology is burdensome, again) Plays well at Machine Translation conferences EMNLP-2004 & Senseval-2004

  29. Speech  LanguageHas the pendulum swung too far? • What happened between TMI-1992 and TMI-2002 (if anything)? • Have empirical methods become too popular? • Has too much happened since TMI-1992? • I worry that the pendulum has swung so far that • We are no longer training students for the possibility • that the pendulum might swing the other way • We ought to be preparing students with a broad education including: • Statistics and Machine Learning • as well as Linguistic Theory • History repeats itself: • 1950s: empiricism • 1970s: rationalism (empiricist methodology became too burdensome) • 1990s: empiricism • 2010s: rationalism (empiricist methodology is burdensome, again) Plays well at Machine Translation conferences Grandparents and grandchildren have a natural alliance… EMNLP-2004 & Senseval-2004

  30. EMNLP-2004 & Senseval-2004

  31. Covering all the BasesIt is hard to make predictions (especially about the future) • When will we see the last non-statistical paper? • 2010? • Revival of rationalism: • 2010? The answer to any question: 6 years! EMNLP-2004 & Senseval-2004

  32. Outline • We’re making consistent progress, or • We’re running around in circles, or • Don’t worry; be happy • We’re going off a cliff… Rising tide of data lifts all boats No matter what happens, it’s goin’ be great! EMNLP-2004 & Senseval-2004

  33. Rising Tide of Data Lifts All BoatsIf you have a lot of data, then you don’t need a lot of methodology • 1985: “There is no data like more data” • Fighting words uttered by radical fringe elements (Mercer at Arden House) • 1993 Workshop on Very Large Corpora • Perfect timing: Just before the web • Couldn’t help but succeed • Fate • 1995: The Web changes everything • All you need is data (magic sauce) • No linguistics • No artificial intelligence (representation) • No machine learning • No statistics • No error analysis EMNLP-2004 & Senseval-2004

  34. “It never pays to think until you’ve run out of data” – Eric Brill Moore’s Law Constant: Data Collection Rates  Improvement Rates Banko & Brill: Mitigating the Paucity-of-Data Problem (HLT 2001) No consistently best learner More data is better data! Quoted out of context Fire everybody and spend the money on data EMNLP-2004 & Senseval-2004

  35. Borrowed Slide: Jelinek (LREC) Benefit of Data LIMSI: Lamel (2002) – Broadcast News WER hours Supervised: transcripts Lightly supervised: closed captions EMNLP-2004 & Senseval-2004

  36. The rising tide of data will lift all boats!TREC Question Answering & Google:What is the highest point on Earth? EMNLP-2004 & Senseval-2004

  37. The rising tide of data will lift all boats!Acquiring Lexical Resources from Data:Dictionaries, Ontologies, WordNets, Language Models, etc.http://labs1.google.com/sets EMNLP-2004 & Senseval-2004

  38. Rising Tide of Data Lifts All BoatsIf you have a lot of data, then you don’t need a lot of methodology • More data  better results • TREC Question Answering • Remarkable performance: Google and not much else • Norvig (ACL-02) • AskMSR (SIGIR-02) • Lexical Acquisition • Google Sets • We tried similar things • but with tiny corpora • which we called large EMNLP-2004 & Senseval-2004

  39. Applications Don’t worry; Be happy • What good is word sense disambiguation (WSD)? • Information Retrieval (IR) • Salton: Tried hard to find ways to use NLP to help IR • but failed to find much (if anything) • Croft: WSD doesn’t help because IR is already using those methods • Sanderson (next two slides) • Machine Translation (MT) • Original motivation for much of the work on WSD • But IR arguments may apply just as well to MT • What good is POS tagging? Parsing? NLP? Speech? • Commercial Applications of Natural Language Processing, CACM 1995 • $100M opportunity (worthy of government/industry’s attention) • Search (Lexis-Nexis) • Word Processing (Microsoft) • Warning: premature commercialization is risky 5 Ian Andersons ALPAC EMNLP-2004 & Senseval-2004

  40. Sanderson (SIGIR-94)http://dis.shef.ac.uk/mark/cv/publications/papers/my_papers/SIGIR94.pdf Not much? • Could WSD help IR? • Answer: no • Introducing ambiguity by pseudo-words doesn’t hurt (much) F 5 Ian Andersons Query Length (Words) Short queries matter most, but hardest for WSD EMNLP-2004 & Senseval-2004

  41. Sanderson (SIGIR-94)http://dis.shef.ac.uk/mark/cv/publications/papers/my_papers/SIGIR94.pdf • Resolving ambiguity badly is worse than not resolving at all • 75% accurate WSD degrades performance • 90% accurate WSD: breakeven point Soft WSD? F Query Length (Words) EMNLP-2004 & Senseval-2004

  42. Two Languages are Better than One For many classic hard NLP problems Word Sense Disambiguation (WSD) PP-attachment Conjunction Predicate-argument relationships Japanese and Chinese Word breaking Parallel corpora  plenty of annotated (labeled) testing and training data Don’t need unsupervised magic (data >> magic) Demonstrate that NLP is good for something Statistical methods (IR & WSD) focus on bags of nouns, Ignoring verbs, adjectives, predicates, intensifiers, etc. Hypothesis: Ignored because perceptrons can’t model XOR Task: classify “comments” into “good,” “bad” and “neutral” Lots of terms associated with just one category Some associated with two Depending on argument Good & Bad, but not neutral: Mickey Mouse, Rinky Dink Bad: Mickey Mouse(us) Good: Mickey Mouse(them) Current IR/WSD methods don’t capture predicate-argument relationships An example of Error Analysis/Representation Some Promising Suggestions(Generate lots of conference papers, but may not support the field) Senseval++ EMNLP-2004 & Senseval-2004

  43. Supervision >> Magic > Baselinehttp://www.sle.sharp.co.uk/senseval2/Results/all_graphs.xls Supervision Baseline Magic Bragging Rights EMNLP-2004 & Senseval-2004

  44. Breakdown by Systems & Words • Spelling correction task • Golding & Schabes (1996) • Some methods work better on some words • and other methods work better on other words • Should breakdown Senseval results by both systems and words • Discover opportunities for hybrids across systems • Error analysis • POS distinctions (easy) • Local context (trigrams) • Larger contexts (IR) EMNLP-2004 & Senseval-2004

  45. Goals of Shared Evaluations • Marketing & Sales • Scores going up and up  Funding goes up and up • Rising tide lifts all boats • Shared learnings • Compare and contrast • What works and what doesn’t? • Error analysis • Benchmarking: • How hard are various problems? • What makes problems easier or harder? • Rate of progress? • Not bragging rights: • Mirror, mirror on the wall, who’s the smartest of them all… EMNLP-2004 & Senseval-2004

  46. Outline • We’re making consistent progress, or • We’re running around in circles, or • Don’t worry; be happy • We’re going off a cliff… According to unnamed sources: Speech Winter  Language Winter Dot Boom  Dot Bust EMNLP-2004 & Senseval-2004

  47. Kuhn Crisis Early Warning Signs for Future • Senseval feels the need to demonstrate applications of their stuff (and maybe there aren’t any) • Complacency (don’t worry; be happy) • Too little dissent: students aren’t rebelling against their teachers • I get uncomfortable when • There is so much agreement on what to do and so much optimism • And so few worries and so little dissent/controversy.  • Mindless Metrics • Whatever you measure, you get… • Scores go up and up and up, but are we really doing better? • According to the scores, parsing is doing well without words, • But you can’t solve classic problems (PPs) without words! • Burdensome Methodology  Exclusiveness • Can’t play (in speech) unless you work in a big lab • Following Speech off a Cliff • Empirical methods: Speech  Language • Speech Winter  Language Winter (Dot Boom  Dot Bust) • What goes up, (usually) comes down… Campbell (ACL-04): Rules >> ML Been great, but… EMNLP-2004 & Senseval-2004

  48. EMNLP-2004 & Senseval-2004

  49. EMNLP-2004 & Senseval-2004

  50. Sample of 20 Survey Questions(Strong Emphasis on Applications) • When will • More than 50% of new PCs have dictation on them, either at purchase or shortly after. • Most telephone Interactive Voice Response (IVR) systems accept speech input. • Automatic airline reservation by voice over the telephone is the norm. • TV closed-captioning (subtitling) is automatic and pervasive. • Telephones are answered by an intelligent answering machine that converses with the calling party to determine the nature and priority of the call. • Public proceedings (e.g., courts, public inquiries, parliament, etc.) are transcribed automatically. • Two surveys of ASRU attendees: 1997 & 2003 EMNLP-2004 & Senseval-2004

More Related