210 likes | 325 Views
Using a Trie-based Structure for Question Analysis. Luiz Augusto Sangoi Pizzato pizzato@ics.mq.edu.au http://www.ics.mq.edu.au/~pizzato. Outline. Question analysis Trie structure Question trie Building and retrieving using the trie Evaluation of the Technique Further work.
E N D
Using a Trie-based Structure for Question Analysis Luiz Augusto Sangoi Pizzato pizzato@ics.mq.edu.au http://www.ics.mq.edu.au/~pizzato
Outline • Question analysis • Trie structure • Question trie • Building and retrieving using the trie • Evaluation of the Technique • Further work “Using a Trie-based Structure for Question Analysis” Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(2/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Question on question • Our question analyser tries to answer two meta-questions: • What is the kind of answer I have to provide? • Define the expected answer type (EAT). • What is the subject of the question? • Define the question focus. Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(3/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Some approaches • EAT • Handcrafted rules • Normally by the use of RE • WordNet top concepts (Moldovan et al., 2003) • High quality results • Support Vector Machines (SVM) (Zhang and Lee, 2003) • Good results using a large training set • Focus • Discard question’ stopwords. Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(4/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Trie structure a|b|c|d|e|f|...|z a|b|c|d|e|f|...|z a|b|c|d|e|f|...|z a|b|c|d|e|f|...|z a|b|c|d|...|r|...|z a|b|c|d|...|r|...|z car zebra a|b|c|d|e|f|...|z Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(5/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Patterns Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(6/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
3 is 2 8 where the 1 6 7 22 25 13 9 24 10 11 17 12 21 5 4 15 26 19 16 who is $ (eoq) $ (eoq) !ORG $ (eoq) !LOC of !POS !ORG of !NAME $ (eoq) is far $ (eoq) !LOC !NAME 14 18 $ (eoq) how 20 is ^ (boq) 23 tall Question Trie Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(7/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
is 3 where 2 the 8 who is far !NAME 1 6 7 of !ORG $ (eoq) !POS $ (eoq) !LOC of is !LOC $ (eoq) $ (eoq) $ (eoq) !ORG !NAME 5 15 26 17 12 11 16 4 19 21 9 22 13 24 25 10 14 18 $ (eoq) how 20 is ^ (boq) 23 tall Question Trie Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(8/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
? ^ who is John Smith of Macquarie University $ ? 1 ^ (boq) 6 7 13 16 17 15 who is !ORG $ (eoq) of !NAME ? 14 $ (eoq) ^ who is Madonna $ Look-ahead process Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(9/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
MQ Questions • JustAsk logs; • 4.8% NL questions • 60.732 of 1.275.116 were NL questions • 47.844 unique NL questions • 23% with some language problems: • Why this search not word? • Unusual language: • Do u offer any scholarships 4 physiotherapy? • Speculative questions: • Will I get a job in Australia after finishing my MBA? Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(10/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Training Set JustAsk questions were randomly selected and semi-automatically tagged according to a XML like structure • <Q AT=‘DESC’>Who is <ENAMEX type=“NAME”>Luiz Pizzato</ENAMEX>?</Q> • Total number of questions: 1385 • 233 – Who • 212 – What • 208 – Where • 203 – How • 529 – Other types: • Am I, Are there, Can I, Do you, Is there, I want, I need, Which, Does, Tell me, Why, Have you, Could you, May I, Will I, Was I, Would you, Whom Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(11/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Evaluation - EAT Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(12/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Evaluation – Focus Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(13/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
3 is 1 2 8 where the 6 7 25 26 9 27 10 11 12 20 4 5 22 13 23 17 14 16 18 who is $ (eoq) $ (eoq) Athens of is of $ (eoq) $ (eoq) $ (eoq) far J. dean ICS ICS Chile Smith Sting 15 19 $ (eoq) how 21 is ^ (boq) 24 tall Question Trie without Entities Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(14/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Evaluation – TREC-2003 Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(15/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Comparison with SVM (Zhang and Lee, 2003) Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(16/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Concluding remarks • The developed technique offers reasonable results using no linguistic resources. • Future developments • Define guidelines for the EAT markup and review the markup of the MQ questions • Adding POS and semantic information from WordNet may replace entity markup Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(17/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Who WP EAT freq NAME 1 DESC 1 ^ ^ EAT freq NAME 1 DESC 1 is VBZ EAT freq NAME 1 DESC 1 $ $ EAT freq NAME 1 JohnNNP EAT freq NAME 1 SmithNNP EAT freq NAME 1 John SmithNNP EAT freq NAME 1 Further Work • Combine lexical and POS information Who is John Smith? Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(18/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
References Dell Zhang and Wee Sun Lee. 2003. Question classification using support vector machines. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR-03), pages 26–32. ACM Press. Dan Moldovan, Marius Paşca, Sanda Harabagiu, and Mihai Surdeanu. 2003. Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst., 21(2):133–154. Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(19/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Acknowledgments • My supervisors • Dr. Diego Mollá-Aliod • Dr. Rolf Schwitter • Dr. Cecile Paris Pizzato, Luiz Augusto Sangoi. Using a Trie-based Structure for Question Analysis.(20/21)In: ALTA Workshop 2004. Macquarie University, Sydney. 8 December 2004.
Using a Trie-based Structure for Question Analysis Luiz Augusto Sangoi Pizzato pizzato@ics.mq.edu.au http://www.ics.mq.edu.au/~pizzato