What’s in store for question-answering? Prognostications based on corpus analysis of several hundred million questions

What’s in store for question-answering?Prognostications based on corpus analysis of several hundred million questions John B. Lowe Vice President for Language Engineering and Chief Linguist Ask Jeeves, Inc. Emeryville, CA October 7, 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora Hong Kong

Overview • “Take-home” messages when considering the Q-A task: • Make sure you understand the question • Know what constitutes an “answer” • Robustness, Robustness, Robustness • Some anecdotes and a few statistics • Query types (keywords, questions, stories, etc.) • From both the consumer side of AJ (I.e. ask.com) as well as the corporate side • Prognostications • The best systems will be hybrids • Knowledge cliff is as tall as ever, if not taller, and it will be some time before it is climbed. Set expectations accordingly!

Another View of an Overview • This presentation contains: • 12 Actual or nearly actual UQs (indeed all queries cited are real, unless cited from the literature or specifically marked) • 7 Rhetorical questions • 5 Summary statistics covering a subcorpus of approximately 1B UQs

Definitions: What is an answer? • Short, coherent, responsive snippet of text • Result of a Computation or Deduction • The Trace of the process of arriving at a result • Longer snippet of text (a Passage) • Reference to a document or part of a document • Summary or extract from a document • Document • Set of documents • Audio, video, etc. • Some combination of the above Increasing length, complexity

Definitions: what is a query? • One or more Keywords • Keywords with Boolean Operators or additional user supplied structure • [numeric or other Parametric Values set via UI] • Phrases (keywords with linguistic coherence) • (Grammatical) Sentences with interrogative or imperative syntax • Short Discourses, usually concluded with a question • Audio, video, etc. • Some combination of the above Increasing length, complexity

Question-Answering vs. IR • Classically, question-answering systems provided answers in response to questions • In contrast, IR systems provided documents in response to queries, normally composed of keywords Corollaries: • Providing documents in response to questions is not question-answering • Providing answers in response to queries is not IR However, “the world is not black and white. More like black and grey” – Graham Greene

Q-A System Q-A System IR System IR System Question Question Keywords Keyword(s) Answer(s) Answer Document Document(s) Question-Answering vs. IR “Classical” Question- Answering “Classical” Information Retrieval TREC-8 Like Hybrid

Query types arranged on a “Difficulty Scale” • Keywords and “Keywords Plus” • Short, factual (TREC-8-like) Questions • “Hard” Lookup Questions • Questions that look hard but aren’t • Questions that look hard and are • “Story Problems” – Two Flavors • These will usually be all jumbled together! Difficulty

An Anecdotal Analysis of the Question-Answering Task, Based on User Behavior and Expectations, as Reflected by What They Ask NB: Most of these UQs are from ask.com, the open-domain consumer-oriented web site. Some UQs from corporate implementations have been modified to protect the anonymity of customers

Users may not be trainable • Users often expect the system to derive or otherwise obtain appropriate context • At minimum, POS, WSD and other basic linguistic and semantic distinctions are expected. • Users may attempt to provide such context if they feel it is important or unavailable to the system • In which case, watch out! • Users may evaluate the system to determine how best to provide input (and context) • Often this is done by “experimental input” • Muddies the user log and challenges adaptive approaches

Intentions of users are complex NB: this data is for ask.com!

Short, factual questions • Where is Greenwich, CT? (Lehnert 1982) • 42N, 80W • About 90 miles north of New York City • … • Where is the Taj Mahal? (TREC-8) • Atlantic City, New Jersey, USA • Agra, Uttar Pradesh, India • NB: answer reflects cultural bias of corpus from which answer was obtained • What is the meaning of life? • For Ask Jeeves, it is (found in) a URL

Answers to “Hard” Lookup Questions “Can my gynacologist [sic] tell my parents if I’m pregnant?” • Answers can be very, very short! • In this case, however, though the length of the answer is a single bit, an authoritative document is probably the best response

“Hard” Lookup questions (cont’d) “Can my gynacologist [sic] tell my parents if I’m pregnant?” • Sometimes spelling errors reflect language competence (and therefore indirectly age) • Sex of asker is (nominally) clear. This bit of personalization is of course based on real world knowledge • Use of “if” rather than “that” prevents presupposing the asker is pregnant • Utility of the answer is very different depending on whether the presupposition is true or not!

Another “Hard” Lookup Question • What did Tom Hanksi say to Private Ryan as hei was dying? • “Answer” is a 34 second snippet which occurs at about 2:36:00 out of 2:48:00 total duration • Soundtrack is complex at this moment; hard to pick out even for native speakers, but the utterance seems to be: “earn this … earn it”

How do we “get the answer”? • Assumptions: • We have the film and permission to use it. • We have time-aligned markup of text and video • We have the tools to handle such multimedia access • All of these [technical] issues are still a challenge… • …But the really tough part is still the relationship between the language and the real world (i.e. primarily linguistics) • Does the markup indicate the states of people -- alive, dead, or in between (i.e. “dying”)? • More importantly, interpreting the question seems to require Mental Spaces (Fauconnier 1985, 1988, &c).

Real World Movie Space Capt. Miller Tom Hanks Matt Damon Private Ryan Mental Spaces Required? In reality, Tom Hanks never said anything to Private Ryan. Conclusion: while IR may bring one within striking distance of the “answer”, high-level NLU potentially required to determine if you really got the right one.

Having said all that… Purely serendipitously, there is an IR solution to this PARTICULAR question (using the “encyclopedic” aspect of the web): Some search engines retrieve discussions about this apparently important moment (which in some ways is the climax of the movie)

Some “hard” questions are easy • Sometimes, Big Differences are not important [ the following two sentences have quite different syntax, but share most of their answers in common. But note: both cannot be answered “Oh, not far!” ] English: What is the distance from Tokyo toYokohama ? How far is itfrom Tokyo toYokohama ? Japanese: 東京と横浜の間の距離は、どのぐらいですか？ toukyou to yokohama no aida no kyori ha, ikura desu ka? 東京から横浜までは､どのぐらい離れていますか？ toukyou kara yokohama made ha, dono gurai hanarete imasu ka?

Some “easy” questions are hard • Small Differences may be important • “Books by kids” • “Books for kids” [these differ only by “stopwords”] • “Books for under $20” • “Books about kids” (Rilloff et. al (1994), Pustejovsky, Lexeme, NPR 6/2000) In this case the “stop words” are critical

Story Problem #1 (“conventional”) • After Bobrow 1967 (and Dreyfus 1972, 1992) • NB: requires you to “Show your work”! (I.e. display trace) “Elizabeth, Brian, Dean and Leslie want to cross a bridge. They all begin on the same side and have only 17 minutes to get everyone across to the other side. It is night and there is only one flashlight. A max of two people can cross at one time. Any party who crosses, either 1 or 2 people, must have the flashlight with them. The flashlight must be walked back and forth; it cannot be thrown. Each student walks at a different speed - Elizabeth 1 minute, Brian 2 minutes, Dean, 5 minutes, and Leslie 10 minutes. A pair must walk together at the rate of the slower students pace. How can they get everyone across in 17 minutes?”

StoryProblem #2, (“a user’s lament”) • Typically, a customer support problem • Often, these are not really questions… • “PLEASE HELP ME! I don't know who to ask. I want to mail merge a specific category from [address book] in [email program] and all I can figure out how to do is merge the entire […] mailing list. If you can't help me, please tell me who can. Thank you”

Story Problem #3, (“I’ve almost got it…”) • Sentence punctuation is poor • Identification and tokenization of NEs is a challenge I cant install Age of Empires now that I 've upgraded to win98 from Win 95 computer says I have 1GB of hard drive space but the installation failed after taking 30 minutes with the words not enough hard drive space. Should I update the drive to FAT 32 and try again?

Story Problem #4, (“share my misery”) "I have SuperOS 1776 and a Hogwarts Color 999 printer. I had to reformat my computer and now I haven't been able to find a driver to reinstall the printer.. I've only found a driver for SuperOS 1492 and 1789. I tried it anyway, and big surprise, it didn't work.” • Deep NLP is going to have some trouble here (even people do!) • Would IR work better?

Story Problem #5, (“a poem”) i want to customize my mouse and keyboard by having mouse in 3-dimensional and having mouse trails i also want to slow down the cursor blink rate am also having trouble ith the left mouse butoon as i am left handed which steps do i take to change these things • The e e cummings approach to keyboard entry… • The point: tremendous stylistic variation exist in typography, orthography, conceptualization, and so on.

Statistical Properties of AJ UQs

Building REs from NEs • …for “Britney Spears” • Br.*n.*y +Sp.*r.*s • How to build these “from scratch”? • But see Brill, et. al. in this workshop for a solution!

Distribution of query lengths !

… 3524 where can i browse lyrics? 2713 where can i find online airfare specials? 2532 is jeeves gay? 2216 how can i find someone? 2190 where can i find a reverse phone directory? 2120 where can i find information on captech? 1980 (filtered) 1934 (filtered) 1852 where can i find erotica from white shadows? 1787 where can i get driving directions between cities? 1585 am i in love? 1567 how do i make a web page? 1544 cars 1520 where can i find a reverse email directory? 1485 how do i use the internet to find a job? 1420 (filtered) 1336 where can i find the lyrics to songs by eminem? 1323 where can i listen to music online? 1318 where can i find pictures of the latest hairstyles? 1313 where can i find a metric conversion table? 1278 where can i find arcade games online? … Zipf’s Law applies to user queries • Rank-frequency distribution of UQs with 3600 > f > 1100 (for a day or so)

Conclusions

Sobering Insights (or Nothing New)? • The bar has been considerably raised! “Communication is not accomplished by the exchange of symbolic expressions. Communication is, rather, the successful interpretation by an addressee of a speaker’s intent in performing a linguistic act.” (Green 1996) • Hybrid approaches will be de rigueur for many practical applications; need to work on combining outputs from: • Search engines • QA systems • Other inferencing engines (decision tree, CBR, etc.) • We must make friends with our users (and provide cognitively appealing UIs) • “If you ask the same question, you get the same answer!” (a distinctly unhuman behavior (e.g. “Where?”) • Knowing when you don’t know: understanding failure modes of the system and communicating this to the user

Thank You!

What’s in store for question-answering? Prognostications based on corpus analysis of several hundred million questions