1 / 13

Finding Needles in the Haystack: Search and Candidate Generation

Finding Needles in the Haystack: Search and Candidate Generation. A presentation by Everett Coraor. What is Hypothesis Generation?. Hypothesis Generation is the process of producing possible answers to a given question. Search of potential documents and passages Creation of candidate list

Download Presentation

Finding Needles in the Haystack: Search and Candidate Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Needles in the Haystack: Search and Candidate Generation A presentation by Everett Coraor

  2. What is Hypothesis Generation? • Hypothesis Generation is the process of producing possible answers to a given question. • Search of potential documents and passages • Creation of candidate list • Scoring of candidates • Balance between wide net and efficiency

  3. Question Analysis Overview • ESG (English Slot Grammar) is used for parsing • Each parse tree contains a headword and a list of modifiers • Recognition of relations such as actorInand authorOf • Identification of LAT (lexical answer type) • “Robert Redford and Paul Newman starred in this depression-era grifter flick” > actorIn(Robert Redford, flick : focus) + actorIn(Paul Newman, flick : focus)

  4. Searching Unstructured Resources • Three different question/answer pair relationships • Document-based searches -Correct answer is the title of the justifying document - “This country singer was imprisoned for robbery in 1972 and pardoned by Ronald Reagan” • TIC (title in clue) Passage searches -The title of the justifying document is present within the question - “Aleksander Kwasniewski became the president of this country in 1995” Title is neither in the question or the answer

  5. Search Query Generation • Full query is constructed weighting subject relations to the focus higher - (2.0 “Robert Redford”) (2.0 “Paul Newman”) star depression era grifter (1.5 flick) • LAT-only query generated to narrow candidate answer list - depression era grifter flick • Unique entity identification - first 20th century US president

  6. Document and Passage Search • Title-Based Document Search -Indri search engine is used -Separate search for long and short documents -Relevant document list size determined through empirical data weighing candidate recall against efficiency • Passage Search -Indri search engine -Lucene search engine

  7. Indri Passage Search • #passage[X : Y] - X-word window - Shifting Y words at a time • 20 word passages scored - Wide range of search terms found scored higher • Treats each passage as a “mini-document” and scores using the document scoring system

  8. Lucene Passage Search • Lucene scores each passage according to query-independent features • Sentence offset -Proximity to beginning of document • Sentence Length • Number of named entities -Passages with more named entities are scored higher

  9. Searching Structured Resources • Answer Lookup • ??? • PRISMATIC search -Large-scale lexicalized relation resource -Gathers aggregate statistics of syntactic or semantic relations - “Unlike most sea animals, in the Sea Horse this pair of sense organs can move independently of one another”

  10. Generating Candidates from Search Results • Structured search candidates -The listed word-relations • Three methods to obtain unstructured search candidates • Title of document candidate generation -Title of relevant documents • Wikipedia Title candidate generation -Extracts all noun phrases that are exclusively Wikipedia titles • Anchor Text candidate generation - “Neapolitan pizzas are made with ingredients like San Marzano tomatoes, which grow on the volcanic plains south of Mount Vesuvius…”

  11. Empirical Results

More Related