160 likes | 299 Views
Implementation of a QA system in a real context. Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France). Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent. 1. The Question-Answering system What is a QA System ?
E N D
Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
1. The Question-Answering system • What is a QA System ? • System that enables the extraction of an answer (or several) to a request (a question) based on a corpus • The problematic of « the type of the question » • An answer or several, possibly a list from one or several documents, an answer of the type Yes/No…, • On a corpus in one or several languages… Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
1.1. QA and Language Processing • A QA system appears to be a LP « par excellence » • However, certain systems are uniquely based on pattern matching (cf Soubotine & Soubotine, TREC 2003), • These systems seems to have reached their limits • And, if they can process all what is factual, the complex questions/queries are far beyond their possibility. • The best systems validated at TREC and CLEF are based on Automated Language Processing. Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
1.2. OUR QA SYSTEM • First developed (1999 - 2001) within a French innovation project (Anvar) • Then (end 2001- end 2003) within the European project TRUST (FP5) • Currently, (2005/06) within the European project M-CAST (FP6) • Main features : targets B2B and B2C, multilingual, NLP based and intensive. Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
A modular conception Italian Language Module Portuguese Language Module Polish Language Module Czech Language Module English Language Module French Language Module Indexation engine Extraction of text engine Documents Visualization of Results Index Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
1.3. Evaluations of the QA system • Professional benchmarking contests and campaigns such as EQueR (2004) and CLEF (2005 & 2006), • Evaluations for the French, English, Portuguese and Spanish language modules, in monolingual and multilingual. Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
CLEF 2005 Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
CLEF 2006 Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
In CLEF 2005 and CLEF 2006, the best engines for monolingual were our systems for Portuguese and French. And the best systems for multilingual were our systems for English-French, Portuguese-French, Spanish-Portuguese, Portuguese-Spanish. • Synapse Développement and Priberam are now partners of the project Quaero. Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
2. Implementation in M-CAST Project • Tests carried-out on books in the National Czech library and the Torun library in Poland, • Processing several millions of digitized documents, • Manages meta-data and UDC classification, • Accommodates questions and answers in English, French, Italian, Portuguese, Polish, Czech • Implemented on both library portals Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
2.1. Adaptation to Digital Libraries Resources • Scanned texts : poor quality • > Spell checker to improve the quality of documents. • One book, lots of pages : • > Management of multi-part documents during semantic analysis Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
2.2. Integration of Dublin Core document’s attributes • Storage of Dublin Core attributes as Metadata • QA : Who is the author of Hamlet ? • Adaptation of the system to search in metadata • Use of those metadata as filters Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
2.3. Universal Decimal Classification • Storage of UDC codes for each document • Search through UDC codes • Filtering through UDC codes • Semantic disambigation through UDC codes Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent
ENDof Presentation I would appreciate your questions ! Thank you - Merci ! Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent