1 / 1

ASQA – Academia Sinica Question Answering System

Question Processing. InfoMap. SVM. AutoTag. Passage Retrieval. Lucene. AutoTag. ASQA – Academia Sinica Question Answering System A Hybrid Architecture for Answering Chinese Factoid Questions.

taro
Download Presentation

ASQA – Academia Sinica Question Answering System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Question Processing InfoMap SVM AutoTag Passage Retrieval Lucene AutoTag ASQA – Academia Sinica Question Answering System A Hybrid Architecture for Answering Chinese Factoid Questions Cheng-Wei Lee, Min-Yuh Day, Tzorg-Han Tsai, Tian-Jian Jiang, Cheng-Wei Shih, Chia-Wei Wu, Yu-Ren Chen, Shih-Hung Wu, Wen-Lian Hsu We propose an architecture for answering Chinese factoid questions in NTCIR CLQA C-C Task. Our system outputs exact answers for factoid questions of six types. Questions are analyzed to obtain question types, keywords, and focuses. Through a simple mapping table, question types are used to constrain possible answer types. Documents are segmented and indexed with both characters and words. After question analysis, queries are constructed from keywords to retrieve possible document passages, which are then sent to a named entity recognition system to obtain answer candidates. Finally, answers are ranked based on the needs of question focuses. Our proposed ASQA successfully combines machine learning and knowledge-based approaches, achieving approximately 40% and 50% accuracy for the Top 1 and Top 5 answers respectively. 1. Answering Factoid Questions 3. Question Analysis 2. Architecture questions When ASQA receives a question, it is analyzed by the Question Processing module to obtain question segments, question types, and question focuses. There are 6 coarse-grained question types and 62 fine-grained question types, which are identified by both InfoMap and a Support Vector Machines (SVM) model. 請問台灣童謠「天黑黑」是由哪位作曲家所創作? Answer Extraction 郭芝苑 Mencius types focus documents segments, pos answer candidates passages Answer Ranking char index word index answers 5. Answering Extraction and Ranking 4. Passage Retrievals Passages are sent to Mencius, our Chinese named entity recognition (NER) engine, to retrieve answer candidates. Mencius use maximum entropy (ME) models to discriminate coarse-grained NE types. In addition, Mencius encodes templates and knowledge in InfoMap to recognize fine-grained NEs and other coarse-grained NEs, such as TIME, NUMBER, and ARTIFACT. After filtering by a QType-to-AType mapping table, answer candidates are given ranking We split documents into small passages using punctuation marks “,!?” and index them by Lucene, a high-performance, full-featured text search engine library. We use DotLucene, the C# version of Lucene, to index our documents. Two indices are created in ASQA. One is indexed by Chinese characters; the other is by Chinese words. Queries are sent to the indices separately, and the results are combined and sorted by the scores returned by Lucene. In the query, different weights are given to keywords with different POS. If there are not sufficient returned passages of the original query, a relaxed version of query will be sent to obtain more passages. 6. Evaluation There are two metrics reported from NTCIR CLQA Organizer about our system (ASQA) that we achieve 37.5% and 44.5% accuracy of correct answers and correct+unsupported answers respectively. scores according to question focus or cue of the question. The answer candidate with the highest score is the one that fits most clues about the question and become the system output. Original Query: +"作曲家"^1.2 +"台灣"^1.2 "創作"^0.7 +"童謠"^1.2 +"天黑黑"^2 Relaxed Version: 作曲家"^1.2 "台灣"^1.2 "創作"^0.7 "童謠"^1.2 "天黑黑"^2 Acknowledgement We would like to thank CKIP for providing us AutoTag for Chinese word segmentation.

More Related