Hapnes Toba, Mirna Adriani, and Ruli Manurung Faculty of Computer Science Universitas Indonesia

Predicting Answer Location Using Shallow Semantic Analogical Reasoningin a Factoid Question Answering System Hapnes Toba, Mirna Adriani, and Ruli Manurung Faculty of Computer Science Universitas Indonesia

What is QAS • Question answering sustem (QAS): • Input: a natural language question • Output: single answer

What is Factoid QAS • Factoid QAS: • Input: an open-domin fact-based question • Output: answer • E.q: • Question: • “Where was an Oviraptor fossil sitting on a nest discovered?” • Answer: • ‘Mongolia’s Gobi Desert’

A Typical pipeline architecture Factoid QAS Question analysis Query formulation Information retrieval Answer selection

A Typical pipeline architecture Factoid QAS Question analysis • Determine the type of a given question, which in turn provides the expected answer type (EAT) • E.q. : person, organization, location. • named-entity recognizer (NER) is usually judged EAT

Semantic Analogical Reasoning (SAR) • SAR predict the location of the final answer in a textual passage by employing the analogical reasoning (AR) framework from Silva et al. (2010). • Author hypothesize that similar questions give similar answers.

Figure 1: Idea of Semantic Analogical Reasoning

SAR System Architecture

Semantic Analogical Reasoning (SAR)

Analogical Reasoning (AR) • AR focus on the similarity between functions that map pairs to links.

Analogical Reasoning (AR) • Lij∈ {0, 1} : • indicator of the existence of a relation between two related objects i and j. • Consider then that we also have K-dimensional vectors, each consisting of features which relates the objects i and j : = Θ [Θ 1 . . . Θ k]T . • This vector will represent the presence or absence of relation between two particular objects.

Analogical Reasoning (AR) • Given the vectors of features Θ, the strength of the relation between two objects i and j is computed by performing logistic regression estimation as follows: P(Lij |xij , Θ) = logistic(Θ TXij) where logistic(x) is defined as: 1 / (1 + e-x)

Analogical Reasoning (AR) • During AR training phase, the framework learns the weight (prior) for each feature by performing the following equation:

Analogical Reasoning (AR) • During the AR retrieval phase, a final score that indicates the rank of predicted relations between two new objects i and j (query) and the related objects that have been learnt in a given set S is compute as follows:

Analogical Reasoning (AR)

Experiments and Evaluation objectives of experiments • find the importance level of the feature set • evaluate the potential of our approach to locate factoid answers in snippets and document retrieval scenarios without using any NER-tool • For this objective we run two kinds of experiments.

Experiments and Evaluation • use the question answer pairs from CLEF 1 English monolingual of the year 2006, 2007 and 2008.

Experiments and Evaluation Importance of feature

Experiments and Evaluation Gold Standard Snippets • Assume: • IR process performed perfectly and returns the best snippet which covers the final answer.

Experiments and Evaluation Gold Standard Snippets

Experiments and Evaluation Gold Standard Snippets: • improve TIMEand MEASURE • TIME: • dd/mm/yy, • dd-mmmyy, • a single year number • hh:mm a.m./p.m. • sometimes the chunker recognizes variations as numbers or as nouns. • MEASURE : • A measurement can be written as numbers (for example: “40”) or as text (“forty”)

Experiments and Evaluation Gold Standard Snippets • ADVP = Adverb phrase • NP = Noun phrase • PP = Prepositional phrase • O = Begin/End of a sentence or a coordinating conjinction

Experiments and Evaluation Indri Document Retrieval • In the real situation, we will not have any information about the semantic chunk of the final answer. • We assume that the best pair (i.e. the top-1 pair after the re-ranking process) of the AR answer features will supply us with that information.

Experiments and Evaluation Indri Document Retrieval • performed IR process by using Indri Search Engine to retrieve the top-5 documents and pass them on to Open Ephyra and our system. • Use same AR feature set as in the first experiment • only use the question feature set • Due to the lack of the answer features, we need to adjust the way of the re-ranking process.

Experiments and Evaluation Indri Document Retrieval

Experiments and Evaluation Indri Document Retrieval • ADVP = Adverb phrase • NP = Noun phrase • PP = Prepositional phrase • O = Begin/End of a sentence or a coordinating conjinction

Experiments and Evaluation Indri Document Retrieval

Conclusion • In this paper we have shown that by learning analogical linkages of question-answer pairs we can predict the location of factoid answers of a given snippet or document. • Author approach achieves a very good accuracy in the OTHER answer-type

Hapnes Toba, Mirna Adriani, and Ruli Manurung Faculty of Computer Science Universitas Indonesia