1 / 21

ASQA: A cademia S inica Q uestion A nswering System for CLQA (IASL)

ASQA: A cademia S inica Q uestion A nswering System for CLQA (IASL). Cheng-Wei Lee , Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang, Chia-Wei Wu, Cheng-Lung Sung, Yu-Ren Chen, Shih-Hung Wu , Wen-Lian Hsu Academia Sinica, Taipei aska@iis.sinica.edu.tw. Outline.

edison
Download Presentation

ASQA: A cademia S inica Q uestion A nswering System for CLQA (IASL)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASQA: Academia Sinica Question Answering System for CLQA(IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang, Chia-Wei Wu, Cheng-Lung Sung, Yu-Ren Chen, Shih-Hung Wu, Wen-Lian Hsu Academia Sinica, Taipei aska@iis.sinica.edu.tw

  2. Outline • The Design Principal • System Architecture • Question Processing • Passage Retrieval • Answer Extraction • Answer Ranking • Performance • Conclusion

  3. The Design Principals of ASQA • Reduce the cost by adopting existing components • InfoMap: a knowledge representation framework • Mencius: an NER engine • AutoTag: a Chinese word segmentation tool • Lucene: an open source IR engine • SVMLight and opennlp.maxent : machine learning packages • Minimizing system complexity • Only shallow NLP techniques are used • We want to see how a Chinese QA system performs without deep NLP techniques • Incorporating human knowledge with machine learning methods • Knowledge editing tool • Knowledge as machine learning features • Knowledge as dominant strategy

  4. Chinese Word Segmentation • Chinese text lacks explicit word boundaries. • Word segmentation is a necessary step in many Chinese applications • There are some word segmentation tools, but not designed for QA • Combination rules are applied to form meaningful words for our QA system 第一銅鐵公司 第一(Neu) 銅(Na) 鐵(Na) 公司(Nc) First Copper Iron Corp. 第一(Neu) 銅鐵(Na) 公司(Nc)

  5. Question Processing Answer Extraction Mencius SVM ME QType Filter InfoMap AutoTag Mencius Segments QFocus, QLimitations Answer Candidates Passage Retrieval Answer Ranking Lucene AutoTag Passages Answers word index char index documents Architecture of ASQA

  6. Question Processing Answer Extraction Mencius SVM ME QType Filter InfoMap AutoTag Mencius Segments QFocus, QLimitations Answer Candidates Passage Retrieval Answer Ranking Lucene AutoTag Passages Answers word index char index documents Question Processing

  7. Question Processing • Capture what the user want • Question classification • Goal: accurately classify a Chinese question into a question type • Chinese Question: 奧運的發源地在哪裡?Where is the originating place of the Olympics? • Question Type: Q_LOCATION|地 • QFocus analysis • Goal: Capture other detail information about the question such as QFocus, NE, Time, QFDescription

  8. Taxonomy of Question Types

  9. A Hybrid Approach for Chinese Question Classification • Hybrid Approach • SVM: machine learning • binary classifiers for each question type • InfoMap: knowledge representation framework • syntactic templates for classifying questions • Features for SVM QC model • Character • Character bigram • HowNet Main Definition

  10. A Hybrid Approach for Chinese Question Classification • InfoMap and SVM are integrated according to their individual advantages • The templates in InfoMap for matching question types are designed with high precision. • The SVM model has the Hownet Main Definition semantic feature. It has better recall. • Use InfoMap approach as the dominant strategy • Only fallback to SVM if there is no InfoMap template matched

  11. QFocus Analysis • QFocus analysis is a tagging problem which is different from QType classification • Some types of information are extracted by QFocus analysis • QFocus: a QFocus is the category name of the answers • Time (TI): Time or Date expressions • Named Entities (NE): PERSON, LOCATION, ORGANIZATION • QF Description (QFD): other description about the answer

  12. A Hybrid Approach of QFocus Analysis • Combine syntactic rules and ME-model • ME-model • Tagging problem • The ME Features are Context words, Context POS, Previous Tags • 718 tagged question sentence • Syntactic rules examples • “Noun” string located behind “的”, “之”  QF • “Noun” string located in front of “是”, “為”, “於”, and “在”  QF • string quoted by “「」” and “( )“  QFD

  13. Question Processing Answer Extraction Mencius SVM ME QType Filter InfoMap AutoTag Mencius Segments QFocus, QLimitations Answer Candidates Passage Retrieval Answer Ranking Lucene AutoTag Passages Answers word index char index documents Passage Retrieval with Lucene

  14. The required operator Initial Query (IQ) sets quoted and noun terms as required Relaxed Query (RQ) doesn’t set any term as required The boosting operator Quoted terms: 2 Nouns: 1.2 Verbs: 0.7 Passage retrieval runtime workflow End Q by IQ with W-idx Sort Q by IQ with C-idx Q by RQ with C-idx Sort YES Any result? Q by RQ with W-idx NO Passage Retrieval with Lucene 請問台灣童謠「天黑黑」是由哪位作曲家所創作? Initial query example: +"作曲家"^1.2+"台灣"^1.2 "創作"^0.7 +"童謠"^1.2+"天黑黑"^2 Relaxed query example:"作曲家"^1.2 "台灣"^1.2 "創作"^0.7 "童謠"^1.2 "天黑黑"^2

  15. Question Processing Answer Extraction Mencius SVM ME QType Filter InfoMap AutoTag Mencius Segments QFocus, QLimitations Answer Candidates Passage Retrieval Answer Ranking Lucene AutoTag Passages Answers word index char index documents Answer Extraction

  16. Answer Extraction • Top 5 passages are sent to answer extraction module • Named entity recognition (Mencius) • PERSON, LOC, and ORG are recognized by ME-based NER engine • Fined-grained and other coarse-grained types are identified by taxonomy and templates in InfoMap • Answer filtering • Answers which are incompatible with the QType are filtered out • Compatibility of question and answer types is defined by a mapping table

  17. Question Processing Answer Extraction Mencius SVM ME QType Filter InfoMap AutoTag Mencius Segments QFocus, QLimitations Answer Candidates Passage Retrieval Answer Ranking Lucene AutoTag Passages Answers word index char index documents Answer Ranking

  18. Answer Ranking • Rank answer candidates with ranking scores • A ranking score is calculated according to the QFocus analysis results QFocus Scores Cue Score NE Score

  19. System Performance of CLQA Chinese to Chinese Task

  20. Performance Fig.2 Fig.1 Fig.3 Fig.4

  21. Conclusions • We have demonstrated that an effective Chinese QA system can be created • Shallow NLP techniques • Integrating knowledge templates (InfoMap) and machine learning methods (SVM, ME) • Open Source IR engine is usable for Chinese QA • In the future work, we would like to include deeper NLP techniques • Parsing • Event structure/Relation

More Related