1 / 6

Continued QA System Development

Continued QA System Development. Arjun Bhalla, Laurel Hart, Kathleen Kamali. Overview. Major Restructuring of System Enhanced Query Processing, Analysis Things to Focus on for Next Deliverable. Major Restructuring. Code was messy, classes were disorganized, data flow was difficult to track

boris
Download Presentation

Continued QA System Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continued QA System Development Arjun Bhalla, Laurel Hart, Kathleen Kamali

  2. Overview • Major Restructuring of System • Enhanced Query Processing, Analysis • Things to Focus on for Next Deliverable

  3. Major Restructuring • Code was messy, classes were disorganized, data flow was difficult to track • Split up code into smaller, more straightforward classes, placed code for doing related tasks into same classes • Renamed method headers as well, changed arguments to accept data properly, more efficiently for processing • Now each portion of the system has a discernible purpose/subtask, and the flow of data is much easier to trace (good for future debugging)

  4. Enhanced Query Analysis • New stopword list trained on training question corpus • First attempt at basic question classification: formed array of question words and parallel array of answer categories • {“who”, “what”, “when”, “where”, “which”, “how”, “why”} • {“person”, “thing”, “time”, “place”, “thing”, “number”, “reason”} • Mapped arrays onto each other by index, and classified each question into a basic category depending on which question word was found inside it • Additionally, used LingPipe API to code a POS-tagger that operates on a question string • - Not yet used, in place for further experiments

  5. New Answer Extraction Strategy • Strategy from D2 was to read in document line by line and extract ngrams of length queryLength*2 and place them as possible answers based on similarity to query words • New strategy was to extract full sentences/paragraphs from document, compare against each token of query string (except for the first one, which is most likely the question-word) • If sentence contained all query words (not including the presumed question word), placed as possible answer • Resulted in full sentences containing the query words found as possible answers by the system • Rationale was to obtain a fully grammatically correct string undoubtedly related to query, which is more likely to contain true answer, attributes which ngram-extraction strategy from D2 was not achieving

  6. Next steps • Indexing not yet focused on, must improve this before entire system can show improvement • For Query processing, pieces have only been put into place, but not being used; must come up with ways to connect POS-tagging with answer categorizer, and use tagging to more accurately match up queries with reasonable answer strings

More Related