140 likes | 241 Views
Use of Patterns for Detection of Answer Strings. Soubbotin and Soubbotin. Essentials of Approach. A certain shift from deep text analysis and NLP methods to surface techniques Use of formulas describing the structure of strings likely bearing certain semantic information. Example.
E N D
Use of Patterns for Detection of Answer Strings Soubbotin and Soubbotin
Essentials of Approach • A certain shift from deep text analysis and NLP methods to surface techniques • Use of formulas describing the structure of strings likely bearing certain semantic information
Example • FBI Director Louis Freeh • A person represented by his/her first/last names • A person occupies a post in an organization
The formula • A word composed of capital letters • An item from a list of posts in an organization • An item from a list of first names • A capitalized word
Patterns • Formulas of such kind were called “patterns” • First used at TREC-10 QA track • Each pattern is characterized by a certain generalized semantics
Steps (Overview) • Identify strings corresponding to a formula • Identify the question terms (types) • Check for expressions negating the semantics of the found strings • Apply the set of formulas (for a particular question type) to match the strings in question-relevant passages
A Surface Approach • No need to distinguish linguistic entities • Formulas for strings look like regular expressions • But patterns include elements referring to lists of predefined words/phrases
Patterns and Question Types • Who is person X? • Who occupies post Y in organization Z? • A relationship is established between 2 or more entities: person, post, organization etc • Where-question: • suggest geographical items as answers • Construct formulas like: item from list of cities/towns/counties, countries/states.
Examples • ”In what year” – questions • Find strings with a sequence of 4 digits • Questions regarding length, area, weight, speed, etc • Digits plus units of measurement • “What is the area of Venezuela?” • 340,569 square miles (a simple pattern match)
Complex Patterns • Strings expressing relationship between several semantic entities • The more complex a pattern is, the higher its reliability
Names and Dates • People Names • Items from first name list • Capitalized words • Specific name elements (bin, van, etc) • Abbreviations like Sr. and Jr. • Dates • Prepositions, articles, digits, month names, commas, dashes, brackets, phrases like “early,” “in the period of,” “years ago,” “B.C.”
Pattern-Matching Strings and Question Semantics • How question words are located in the pattern-matching string (distance, left/right, position to other matching strings etc) • Simplicity of a pattern’s structure is compensated by complexity of rules • Without applying heuristic rules, sufficiently reliable results cannot be ensured • Rank assigned to question words/phrases and score assigned to candidate answers
QA Process • Define question types for all questions • Order the questions with more reliable patterns • Form and rank queries from question terms • Modify queries (if score is below threshold) • Identify pattern-matching strings (apply complex and then simple) • Check correlation between patterns and question semantics • Identify exact answers and calculate their scores
Analysis of Results • TREC 2002: • confidence-weighted score = 0.691 • 271 right answers, 209 wrong answers, 148 “no answer” • First 29 correct answers belonged to question types with highly reliable patterns • Incorrectly identified answer strings = 13.6% (excluding NIL answers)