300 likes | 312 Views
Join the dots between ontology, computer science, and learning about the world. The most versatile system for all your questions answers. Explore with us.
E N D
The 3rd International Conference on Arabic Natural Language Processing Three-level approach for Passage Retrieval in Arabic Question/Answering Systems Lahsen Abouenour1, Karim Bouzoubaa1, Paolo Rosso2 Mohammadia School of Engineers, Rabat, Morocco - May 2009
Arabic Question/Answering Systems Classical IR User Query (keywords) 2 1 List of documents/links ? User Checking 3 Answer to User Query 4 ???
Arabic Question/Answering Systems Question/Answering User Query (question = keywords+structure) 1 ? List of documents/links 2 User Checking Answer to User Query 3 ???
Arabic Question/Answering Systems Existing Arabic Q/A Systems • QARAB (based on Al-Raya corpus) • AQAS (extract answers from only structured texts) • ArabiQA (deal with factoid questions, embeds NER module ) • QASAL (semi-automatic Q/A system for factoid questions) Three Modules Question Analysis Passage Retrieval Answer Extraction Question type Candidate passage Answer identification Keywords Passage ranking Answer construction Named Entities … … …
Arabic Question/Answering Systems Challenges of Arabic Q/A Systems • short vowels, • absence of capital letters, • complex morphology, • etc.
Arabic Question/Answering Systems Question/Answering User Query (question = keywords+structure) 1 Natural Language (أين توجد مدينة مراكش ؟ | Where is the city of Marrakech ?) -- Keywords : Where | is | the | city | of | Marrakech أين| توجد| مدينة | مراكش ? -- Structure : أين توجد مدينةمراكش ؟ Where isthecity of Marrakech ? ≠ ≠ IsMarrakechacity? هلمراكشمدينة ؟
Arabic Question/Answering Systems Question/Answering Passage Retrieval (أين توجدمدينةمراكش ؟ | Where is the city of Marrakech ?) 2 Passage 1 Xxxxx مراكش (Marrakech)xxxxxx xx xxx xxxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxxمدينة (city) xxxxx xx xxx توجد (exist in) xxx No answer Passage N المغرب (Morroco) xxx مراكشإقليميوجد (the region of marrakech exists in) xxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxx xx xxxxx xx xxx xx xxx The answer
Arabic Question/Answering Systems Question/Answering Passage Retrieval (أين توجدمدينةمراكش ؟ | Where is the city of Marrakech ?) 2 Passage 1 Xxxxx مراكش (Marrakech) xxxxxx xx xxx xxxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxxمدينة (city) xxxxx xx xxx توجد (exist in) xxx (Is in | Marrakech | city) توجد | مراكش | مدينة Morphological relation hyponymy/semantic relation Passage N المغرب (Morroco) xxx مراكشإقليميوجد (the region of marrakech exists in) xxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxx xx xxxxx xx xxx xx xxx يوجد | مراكش | إقليم (Is in | Marrakech | city)
Arabic Question/Answering Systems Question/Answering Passage Retrieval (أين توجدمدينةمراكش ؟ | Where is the city of Marrakech ?) 2 Passage 1 Passage N Xxxxx مراكش xxxxxx xx xxx xxxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxxمدينة xxxxx xx xxx توجد xxx المغرب xxx مراكشإقليميوجدxxx Xx xxx xxxxx xxx xxxx xxx xxxx Xxxxx xx xxxxx xx xxx xx xxx Vs ??? With respect to Morphological and Semantic Relation relevance(P1)=relevance(PN) What about the question structure ?
Arabic Question/Answering Systems Question/Answering Passage Retrieval (أين توجدمدينةمراكش ؟ | Where is the city of Marrakech ?) 2 Expected Answer: Question: أين توجدمدينةمراكش ؟ توجدمدينةمراكش في (The city of Marrakech is in …) (Where is the city of Marrakech ?) Passage 1 structures Passage N structures
Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Levels Semantic Query Expansion (extending the list of keywords related to the user question) Keyword-based level (candidate passages with related keywords) Structure-based level (candidate passages with related structure) Semantic reasoning level (comparing CG representations)
Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Resources & Tools Semantic Query Expansion (Arabic WordNet, Amine Plateform) Keyword-based PR (Yahoo API) Structure-based PR (The Java Information Retrieval System - JIRS) Semantic reasoning level (Amine Plateform)
Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Semantic Query Expansion Ontology • AWN is a free Lexical resource • AWN containsOver than 20 000 arabic words grouped into synsets • AWN is connected with the SUMO (Suggested Upper Merged Ontology) • SUMO has about 2000 general concept • SUMOMany relations between concepts (hyponymy, hypernymy, ...)
Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Semantic Query Expansion Amine Platform • Amine is a multi-layer platform dedicated to the development • of Intelligent Systems and Multi-Agents Systems • - Amine is an Open Source Platform • - Amine is 100 % Java implementation • - Amine provides a set of operations related to Ontologies
Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Semantic Query Expansion Arabic WordNet Temporary DataBase(MySQL) Content Structure Link with SUMO Amine Platform API JAVA Program Amine AWN ontology
Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Semantic Query Expansion
Concept/Term Global Expansion Morphological Expansion AAWN Ontology Expansion 1 - By synonyms 2 – By supertypes 3 – By definition 4 – By subtypes Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Semantic Query Expansion
Arabic Question/Answering Systems Our Passage Retrieval Approach : Presentation Structure-based PR The Java Information Retrieval System (JIRS) • a language-independent PR system • adpated for many non-agglutinative European languages (English, French, Spanish, Italian, ...) • adapted for the Arabic language • re-ranking of the retrieved passages is based on a distance density n-gram model URL : http://sourceforge.net/projects/jirs/
Arabic Question/Answering Systems Our Passage Retrieval Approach : Evaluation Process CLEF Questions TREC Questions 1 - Manual Process 2 - Automatic Process Google Semantic QE Yahoo Semantic QE JIRS Semantic QE JIRS Google Yahoo Keyword-based Structure-based
Arabic Question/Answering Systems Our Passage Retrieval Approach : Evaluation Process The Questions • a set of 82 of the CLEF and TREC questions • facoid questions seeking for NE • significant coverage : questions classified into different domains
Arabic Question/Answering Systems Our Passage Retrieval Approach : Evaluation Process Keyword-based evaluation Accuracy and MRR have been improved after using semantic QE
Arabic Question/Answering Systems Our Passage Retrieval Approach : Evaluation Process Structure-based evaluation Accuracy and MRR have been improved after using semantic QE Compared to the keyword-based PR, the structure-based PR gives The best Accuracy and MRR
Arabic Question/Answering Systems Our Passage Retrieval Approach : Evaluation Process Summarize Yes No Semantic Query Expansion Acc. 1,22% MRR 0,99 Acc. 7,32% MRR 3,25 Keyword-based PR Acc. 19,51% MRR 7,85 Acc. 15,85% MRR 5,46 Structure-based PR
Question Expected Answer CG-EA Semantic score (p1) Generalization (CG-P1,CG-EA) P1 sub passage CG1 Semantic score (pi) Generalization (CG-Pi,CG-EA) Pi sub passage CGi Arabic Question/Answering Systems Our Passage Retrieval Approach : The semantic reasoning level Presentation
Arabic Question/Answering Systems Our Passage Retrieval Approach : The semantic reasoning level Example TREC question: أين تقع أعلى نقطة على سطح الأرض؟ (Where is the highest point on the surface of the earth?" ) >> Using Google Search Engine
Arabic Question/Answering Systems Our Passage Retrieval Approach : The semantic reasoning level Example TREC question: أين تقع أعلى نقطة على سطح الأرض؟ (Where is the highest point on the surface of the earth?" ) >> Passages Ranks after LEVEL 1 (Keyword-based) and LEVEL 2 (Structure-based)
Arabic Question/Answering Systems Our Passage Retrieval Approach : The semantic reasoning level Example TREC question: أين تقع أعلى نقطة على سطح الأرض؟ (Where is the highest point on the surface of the earth?" ) The expected answer is: تقع أعلى نقطة على سطح الأرض في ... • CG-EA : [نقطة]- • -attr->[أعلى], • -ala->[الأرض], • <-agnt-[تقع]-fi->[مفهوم عام]
Arabic Question/Answering Systems Our Passage Retrieval Approach : The semantic reasoning level Example TREC question: أين تقع أعلى نقطة على سطح الأرض؟ (Where is the highest point on the surface of the earth?" ) Semantic Score Formula SemanticScore(P) = ∑(weight(ci)*β(ci,π(ci)))/ ∑(weight(ci) ci C
Conclusion & Future Work • The keyword-based and structure-based levels of our Arabic PR approach have improved the Accuracy and the MRR in the context of Q/A systems • A semantic reasoning level on top of the first and second levels could impove even more the reached performances • Covering all CLEF and TREC questions • Automating the semantic reasoning level module • Conducting corresponding experiments • Integrating more enriched releases of Arabic WordNet
Thank you for your attention >> Questions