1 / 21

Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering. Andrew Zitzelberger. Problem. Constraint Based Queries. Queries. Test Queries     1) Find me a Wii game.     2) Find me a Honda for under 15 thousand dollars.     3) Roller Coaster more than 150 feet high

faraji
Download Presentation

Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

  2. Problem

  3. Constraint Based Queries

  4. Queries Test Queries     1) Find me a Wii game.     2) Find me a Honda for under 15 thousand dollars.     3) Roller Coaster more than 150 feet high     4) mountains at least 15K feet     5) games under $25     6) mountains less than 4 km     7) ps games < $40     8) coasters longer than 1000 feet     9) car for under 5 grand newer than 1990 with less than 115K miles    10) more than 15K miles under 5 grand newer than 2004

  5. Keywords + Semantics • Semantic queries are computationally expensive • Keyword queries are fast and simple • People are used to keyword queries • Synergistic solution: • extract numerical constraints from the query • use keywords to quickly narrow the search space • use constraints as a filter

  6. Data Frames Price     internal representation: Double     external representation: \$[1-9]\d{0,2}(,\d{3})*|...     ...     right units: (K)?\s*(cents|dollars|[Gg]rand|...)     canonicalization method: toUSDollars     comparison methods:         LessThan(p1: Price, p2: Price) returns (Boolean)         external representation: (less than|<|under|...)\s*{p2}|...         ...     end

  7. Data Frame Library

  8. Free Form Query • Car under 6 grand newer than 1990 with less than 115K miles

  9. Step 1: Condition Extraction • Car under 6 grand newer than 1990 with less than 115K miles • Extracted Conditions • (Price < 6000) • (Year > 1990) • (Distance  < 115000)

  10. Step 2: Remove Condition Values • Car under newer than with less than

  11. Step 3: Remove Stopwords • Car

  12. Step 4: Perform Keyword Search

  13. Step 5: Filter Document on Constraints • Keep page if every constraint is satisfied by at least one extracted value

  14. Experimental Setup • 300 web documents • 100 car+trucks pages from http://provo.craigslist.org • 100 video gaming pages from http://provo.craigslist.org • 50 mountain pages from http://en.wikipedia.org • 50 roller coaster pages from http://en.wikipedia.org • 10 queries • 8 with usable conditions • 2 data sets • test-development • blind test

  15. Precision@3/Query Type Keyword Queries Reduced Queries Data Frame Augmented Queries Dev-Test Queries 33% 40% 60% Blind-Test Queries 50% 46% 63% Overall 42% 43% 62% Results Summary • Precision increase for 56% of queries  • 75% for test-dev, 50% for blind-test • Precision never worse than keyword query • Most effective for short, focused documents

  16. Discussion • Issues: • inadequate narrowing or ranking of search space • noise caused by other numbers • Distance < 115000

  17. Future Work • Scalability • Indexing data frame extracted terms • Precision vs Recall trade-offs • Pay-as-you-go search construction

  18. Related Work • Question-Answering Systems • Keyword search over databases and semantic stores

  19. Questions?

  20. Query Keyword Condition Removed Keyword Data Frame Augmentation Find me a Wii game. 0.33 0.33 0.33 Find me a Honda for under 15 thousand dollars. 0.67 1.00 1.00 roller coaster more than 150ft high 0.33 0.33 0.67 mountains at least 15K ft 1.00 0.67 1.00 games under $25 0.00 0.33 0.67 mountains less than 4 km 0.00 0.00 0.33 ps games < 40 bucks 0.33 0.00 0.33 coasters longer than 1000 feet 0.33 1.00 1.00 car for under 6 grand newer than 1990 with less than 115K miles 0.33 0.33 0.67 more than 15K miles under 10 grand newer than 2000 0.00 0.00 0.00 Results (Test-Dev Set)

  21. Query Keyword Condition Removed Keyword Data Frame Augmentation Find me a Wii game. 0.67 0.67 0.67 Find me a Honda for under 15 thousand dollars. 0.67 1.00 1.00 roller coaster more than 150ft high 0.67 0.67 0.67 mountains at least 5K ft 0.33 0.33 0.67 games under $25 0.67 0.67 1.00 mountains less than 4 km 0.00 0.00 0.00 ps games < 40 bucks 0.33 0.33 0.33 coasters longer than 1000 feet 0.67 0.67 0.67 car for under 6 grand newer than 1990 with less than 115K miles 0.67 0.00 1.00 more than 15K miles under 10 grand newer than 2000 0.33 0.33 0.33 Results (Blind Test Set)

More Related