1 / 18

Web Logs and Question Answering Richard Sutcliffe 1 , Udo Kruschwitz 2 , Thomas Mandl 3

Web Logs and Question Answering Richard Sutcliffe 1 , Udo Kruschwitz 2 , Thomas Mandl 3 1 - University of Limerick, Ireland 2 - University of Essex, UK 3 - University of Hildesheim, Germany. Outline. Question Answering (QA) Query Log Analysis (QLA) Characteristics of QA and QLA

jules
Download Presentation

Web Logs and Question Answering Richard Sutcliffe 1 , Udo Kruschwitz 2 , Thomas Mandl 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Logs and Question Answering Richard Sutcliffe1, Udo Kruschwitz2, Thomas Mandl3 1 - University of Limerick, Ireland 2 - University of Essex, UK 3 - University of Hildesheim, Germany

  2. Outline • Question Answering (QA) • Query Log Analysis (QLA) • Characteristics of QA and QLA • QA & QLA: 8 Key Questions • Workshop Papers • Key Questions addressed by Papers • Conclusions

  3. Question Answering (QA) • A Question Answering (QA) system takes as input a short natural language question and a document collection and produces an exact answer to the question, taken from the collection • Origins go back to TREC-8 (Voorhees and Harman, 1999)

  4. Query Log Analysis (QLA) • A Query Log is a record of a person’s internet search • Log comprises query plus related information • Query Log Analysis looks at Logs mainly in order to improve search engines • Early study was Spink and Saracevic (2000)

  5. Strengths & Weaknesses of QA • Following TREC, CLEF and NTCIR, we know how to build efficient monolingual factoid QA systems • However, range of questions asked is extremely narrow • Also, work is based on fixed document collections • Most evaluation is offline, using artificial queries • Real users and real information needs have been ignored • Thus, QA is not a solved problem

  6. Strengths & Weaknesses of QLA • Potentially there is a huge amount of data, increasing all the time • Queries entered are ‘naturally occurring’ because users do not know they are monitored! • On the other hand, huge data sets pose problems; manual analysis cannot be used but machine learning etc must be used • We must infer from behaviour what users were thinking, what they wanted and whether a search succeeded • Also, logs are mostly owned by search engine companies

  7. QA & QLA – 8 Key Questions • 1. Can the meaning of queries in logs be deduced? • 2. Can NLP techniques such as Named Entity Recognition be applied in QLA? • 3. Can QLA tell us new types of questions for QA research? • 4. Can queries within a session be interpreted as a dialogue with the user giving the questions and the system providing the answers?

  8. QA & QLA – 8 Key Questions Cont. • 5. What can logs from real QA systems like lexxe.com or questions from sites like answers.com tell us? • 6. Are QA logs different from IR logs? • 7. Can click-through data enable us to deduce new QA question types? • 8. What analysis could be done on logs made from telephone QA systems (e.g. cinema booking)

  9. Papers -1 • Bernardi and Kirschner: From artificial questions to real user interaction logs • Real logs vs. not real questions at TREC etc • Three sets (TREC, Bertomeu & BoB) analysed as dialogues • TREC differs significantly from BoB (query length, no. of anaphora) • Conclusion: future TREC-style evaluation should take these differences into account to make task more realistic

  10. Papers - 2 • Leveling: QA evaluation queries vs. real world queries • Compares queries to a search engine, to answers.com, and used at TREC and CLEF (six sets) • Infers the QA question type of a bare IR query (keywords) and converts it back into a syntactic QA query • Conclusion: This process could be used to answer IR queries properly with a QA system

  11. Papers - 3 • Zhu et al.: Question Answering based on Community QA • Considers whether Q-A pairs from Yahoo Answers can be used a log-like resource to improve QA • Given input query, similar queries are identified in logs. Sentences from answers to these are selected by summarisation algorithm to use as response to query

  12. Papers - 4 • Momtazi and Klakow: Yahoo! Answers for sentence retrieval • Two statistical frameworks developed for capturing relationships between words in Q-A pairs in Yahoo! Answers • These were then used in sentence selection task based on TREC 2006 queries • Conclusion: Best results exceeded the baseline

  13. Papers - 5 • Small and Strzalkowski: Collaborative QA using web trails • Logs were made of users in an interactive QA study • Information stored includes the documents users saved • Docs are placed in standard order to allow comparison between users; docs saved by different users overlap • When previously observed seq of docs is saved by user, rest of that seq could be presented to user

  14. Papers - 6 • Sutcliffe, White and Kruschwitz: NE recognition in intranet query log • A log of queries to a university web site was first analysed by hand • This resulted in a list of topic types and a list of Named Entity types • Training data for NEs was extracted from web pages and used to train maximum entropy recogniser • NE recogniser was evaluated; uses of NEs in answering queries were discussed

  15. Papers - 7 • Mandl and Schulz: Log-based evaluation resources for QA • Concerned with link between query logs and well-formed questions answered by QA systems • Proposes a system switching between IR-mode and QA-mode • Discusses log resources available and related tracks at CLEF • Presents preliminary analysis of questionl-like queries in MSN log

  16. Papers vs. Workshop Goals - 1 • Bernardi and Kirschner investigate Question 6 • Leveling investigates Question 1 and Question 3 • Momtazi and Klakow look at Question 5 • Zhu et al. also look at Question 5 • Small and Strzalkowski investigate Question 4

  17. Papers vs. Workshop Goals - 2 • Sutcliffe et al. look at Question 2 • Mandl and Schulz also look at Question 3 • Only Questions 7 and 8 are not addressed at the workshop!

  18. Conclusions • It looks like an interesting field • We look forward to your papers • There will be time at the end for discussion

More Related