120 likes | 232 Views
Cross-Language French-English Question Answering using the DLT System at CLEF 2003. Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe. Documents and Linguistic Technology Group Univeristy of Limerick. Outline. Objectives System architecture Key components Task performance evaluation
E N D
Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and Linguistic Technology Group Univeristy of Limerick
Outline • Objectives • System architecture • Key components • Task performance evaluation • Findings
Objectives • Learn the issues involved in multilingual QA • Combine the components of our existing English and French monolingual QA systems
System architecture Query classification Query translation (Google) & re-formulation Named entity recognition Text retrieval (dtSearch) Answer entity selection
Query classification • Categories based on translated TREC 2002 queries • Keyword based classification • what_country • De quel pays le jeu de croquet est-il originaire • De quel nation..? • Unknown
Query translation and re-formulation • Submitting the French query in its original form on the Google Language Tools page • Tokenisation • Selective removal of stopwords • Example: • Qui a été élu gouverneur de la California? • Who was elected governor of California? • [ ‘elected’, ‘governor’, ‘California’]
Text Retrieval: Submitting queries to dtSearch • dtSeach indexed the doc collection based on <DOC> tags • Inserting a w/1 connector between two capitalised words • Submitting untranslated quotations for exact match • Inserting an AND connnector between all other terms (Boolean) • Limited verb expansion based on common verbs used in TREC questions
Named Entity Recoginition: General Names • Captures any instances of general names in cases where we are not sure what to look for. • A general_name is defined in our system to be up to five capitalised terms interspersed with optional prepositions. • Examples: Limerick City • University of Limerick
Answer entity selection • highest_scoring • What year was Robert Frost born? • in entity(date,[1,8,7,5],[[],[],[], [], [1,8,7,5]],[],[],[]), poet target([Robert]) target(Frost]) was target([born]) in San Francisco • most_frequent • When did “The Simpsons” first appear on television? • When target([The]) target([Simpsons]) was target(first]) broadcast in entity(date[1,9,8,9,,[[],[],[],[],[],[1,9,8,9],[],[],])
Task performance evaluation Adapted from Magnini (2003)
Findings • Query classification: unexpected formulation of queries, too few categories • Translation: problems with names, titles, • - We need better query-specific translation • - Localisation of names/titles • - Possibly limit translation to search terms • An interface could be built for the parser to enable it to be tested by an end user • Error types 6-13 could be investigated and the parser extended to handle some of them • Practical studies in the use of STS could be carried out
Findings • Text retrieval: allow relaxation and more sophisticated expansion of search queries • Named entity recognition: find better alternatives to answer questions of type Unknown • Answer entity selection: take into account distance and density of query terms • Usability issue: answers may need to be translated back to French