1 / 7

Query Segmentation and Structured Annotation via NLP

Query Segmentation and Structured Annotation via NLP. Rifat Reza Joye Panagiotis Papadimitriou. Problem. Caloricious.com: Semantic search engine for food items F ree-text queries over structured data Query: gluten free high protein bars

carrington
Download Presentation

Query Segmentation and Structured Annotation via NLP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Segmentation and Structured Annotation via NLP Rifat Reza Joye Panagiotis Papadimitriou

  2. Problem • Caloricious.com: • Semantic search engine for food items • Free-text queries over structured data • Query: gluten free high protein bars • Data: Each food item is database record with attributes name, brand, category, nutrients, allergens, .. • Query segmentation and structured annotation gluten free high protein bars ALLERGEN NUTRIENT CATEGORY

  3. 1st ApproachMEMM with Synthetic Training Data • Seems as instance of NER • Problem: No labeled queries to train MEMM • Solution: Generate synthetic labeled queries • Query study in 100 queries • 96% queries contain 1–3 segments. • One of the segments in 98% queries refers to Name or Category or Brand • Algorithm • Pick a food item at random • Pick 1-3 attributes and generate a query

  4. 2nd ApproachSegmentation & MaxEnt Classification Query Segmentation Segment Annotation Annotate each segment with an attribute using MaxEnt classifier Training: For each attribute training examples come from the corresponding entries of database products • Train language model on structured data text • Use model to find segment probabilities • Find the ML segmentation through DP gluten free high protein bars gluten free high protein bars

  5. Results

  6. Conclusions – Future Work • Combination of Language Model, Dynamic Programming and MaxEnt classification provides very good accuracy without labeled data • It would be interesting to compare with NER on a big labeled set • We also plan to compare with the state-of-the art algorithm in the context of a research submission.

  7. More Results… • Evangelos • March 12, 2011 @ 9.14am • 19.5 inches • 6lbs 11oz

More Related