260 likes | 427 Views
Talk Schedule Question Answering from Email. Bryan Klimt July 28, 2005. Project Goals. To build a practical working question answering system for personal email To learn about the technologies that go into QA (IR,IE,NLP,MT) To discover which techniques work best and when. System Overview.
E N D
Talk Schedule Question Answering from Email Bryan Klimt July 28, 2005
Project Goals • To build a practical working question answering system for personal email • To learn about the technologies that go into QA (IR,IE,NLP,MT) • To discover which techniques work best and when
Dataset • 18 months of email (Sept 2003 to Feb 2005) • 4799 total • 196 are talk announcements • hand labelled and annotated • 478 questions and answers
A new email arrives… • Is it a talk announcement? • If so, we should index it.
Email Classifier Decision Logistic Regression Combo Email Data Logistic Regression
Classification Performance • precision = 0.81 • recall = 0.66 • (previous works had better performance) • Top features: • abstract, bio, speaker, copeta, multicast, esm, donut, talk, seminar, cmtv, broadcast, speech, distinguish, ph, lectur, ieee, approach, translat, professor, award
Annotator • Use Information Extraction techniques to identify certains types of data in the emails • speaker names and affiliations • dates and times • locations • lecture series and titles
Rule-based Annotator • Combine regular expressions and dictionary lookups • defSpanType date =: ...[re('^\d\d?$') ai(dayEnd)? ai(month)]...; • matches “23rd September”
Conditional Random Fields • Probabilistic framework for labelling sequential data • Known to outperform HMMs (relaxation of independence assumptions) and MEMMs (avoid “label bias” problem) • Allow for multiple output features at each node in the sequence
Rule-based vs. CRFs • Both results are much higher than in previous study • For dates, times, and locations, rules are easy to write and perform extremely well • For names, titles, affiliations, and series, rules are very difficult to write, and CRFs are preferable
Template Filler • Creates a database record for each talk announced in the email • This database is used by the NLP answer extractor
Filled Template Seminar { title = “Keyword Translation from English to Chinese for Multilingual QA” name = Frank Lin time = 5:30pm date = Thursday, Sept. 23 location = 4513 Newell Simon Hall affiliation = series = }
Search Time • Now the email is index • The user can ask questions
IR Answer Extractor • Performs a traditional IR (TF-IDF) search using the question as a query • Determines the answer type from simple heuristics (“Where”->LOCATION) Where is Frank Lin’s talk? 0.5055 3451.txt search[468:473]: "frank" search[2025:2030]: "frank" search[474:477]: "lin” 0.1249 2547.txt search[580:583]: "lin” 0.0642 2535.txt search[2283:2286]: "lin"
NL Question Analyzer • Uses Tomita Parser to fully parse questions to translate them into a structured query language • “Where is Frank Lin’s talk?” • ((FIELD LOCATION) (FILTER (NAME “FRANK LIN”)))
NL Answer Extractor • Simply executes the structured query produced by the Question Analyzer • ((FIELD LOCATION) (FILTER (NAME “FRANK LIN”))) • select LOCATION from seminar_templates where NAME=“FRANK LIN”;
Results • NL Answer Extractor -> 0.870 • IR Answer Extractor -> 0.755
Results • Both answer extractors have similar (good) performance • IR based extractor • easy to implement (1-2 days) • better on questions w/ titles and names • very bad on yes/no questions • NLP based extractor • more difficult to implement (4-5 days) • better on questions w/ dates and times
Examples • “Where is the lecture on dolphin language?” • NLP Answer Extractor: Fails to find any talk • IR Answer Extractor: Finds the correct talk • Actual Title: “Natural History and Communication of Spotted Dolphin, Stenella Frontalis, in the Bahamas” • “Who is speaking on September 10?” • NLP Extractor: Finds the correct record(s) • IR Extractor: Extracts the wrong answer • A talk “10 am, November 10” ranks higher than one on “Sept 10th”
Future Work • Add an annotation “feedback loop” for the classifier • Add a planner module to decide which answer extractor to apply to each individual question • Tune parameters for classifier and TF-IDF search engine • Integrate into a mail client!
Conclusions • Overall performance is good enough for the system to be helpful to end users • Both rule-based and automatic annotators should be used, but for different types of annotations • Both IR-based and NLP-based answer extractors should be used, but for different types of questions