340 likes | 459 Views
HyKSS: Hybrid Keyword and Semantic Search. Andrew Zitzelberger. 1. Keyword Search. 2. Form Based Search. 3. What about?. over 8,000 meters in elevation. less than 100K miles. faster than 100 mph. 4. 5. HyKSS. Hy brid K eyword and S emantic S earch
E N D
HyKSS: Hybrid Keyword and Semantic Search Andrew Zitzelberger 1
What about? over 8,000 meters in elevation less than 100K miles faster than 100 mph 4
HyKSS • Hybrid Keyword and Semantic Search • Semantics – extracted annotations • Multiple ontologies • Keywords – text 6
Thesis Statement • HyKSS (hybrid search) • Outperforms keyword and semantic search • Dynamic query weighting outperforms various other hybrid search approaches • Allows queries over multiple ontologies • Allows pay-as-you-go improvement 7
Indexing Architecture Document Collection Keyword Indexer Semantic Indexer Keyword Index Semantic Index 10
Document Collection Keyword Indexer Semantic Indexer Keyword Index Semantic Index Indexing Architecture Implementation Ontology Library Lucene OntoES Sesame 11 11
Query Processing Free Form Query Keyword Processing Semantic Processing Pre-Process Query Pre-Process Query Execute Query Execute Query Post-Process Query Post-Process Query Combine Results 12
Keyword Query Pre-Processing • Remove Lucene special characters (except quotes) • Remove (inequality) comparison constraints • Remove non-phrase stopwords hondas in "excellent condition" in oremfor under 12 grand hondas “excellent condition” orem 13
Keyword Query Execution and Post-Processing • Executed by Lucene • Empty Post-Processing step 14
Semantic Query Pre-ProcessingIndividual Ontology Scoring hondas in "excellent condition" in oremfor under 12 grand 15
Semantic Query Pre-ProcessingOntology Set Creation • For each ontology sorted by score: • For each remaining ontology: • Add point for each new or subsuming match • If added points > 0 add ontology • Completely subsumed ontologies are removed during query generation 16
Semantic Query Pre-ProcessingOntology Set Creation Vehicle Location Price < 12000 US_City=“orem” Vehicle Price < 12000 Vehicle_Score + 1 ContractualServices Location Contractual Services Price < 12000 US_City=“orem” ContractualServices_Score + 1 Vehicle_Score 17
Semantic Query Pre-ProcessingStructured Query Generation • Open world assumption • SPARQL query 18
Semantic Query Execution and Post-Processing • Sesame query execution • Semantic ranking: • 1 point for each requested projection satisfied • Normalized by # of projections requested hondas in "excellent condition" in oremfor under 12 grand • Projections on Make, Price and US_City 19
Hybrid Query Processing • Linear interpolation: • (kw_weight * kw_score) + (sm_weight * sm_score) • Dynamic solution: • # keywords remaining (#kw) • concept match score (cms) = ½ * (selections + projections) • kw_weight = #kw/(#kw + cms) • sm_weight = cms/(#kw + cms) 20
Basic Search 21
Experimental Setup – Ontology Libraries • 5 Ontology Levels • Number • Generic Units • Vehicle Units • Vehicle • Vehicle+ 25
Experimental Setup – Query Sets • 113 syntactically unique queries from database students • 60 syntactically unique queries from linguistic students 26
Experimental Setup – Document Collection • 250 vehicle advertisements (Craigslist) • 100 training, 50 validation, 100 test • 318 mountain pages (Wikipedia) • 66 roller coaster (Wikipedia) • 88 video game advertisements (Craigslist) 27
Experiments Training queries over test vehicle documents Test queries over test vehicle documents Training queries over test vehicle documents + additional noise Test queries over test vehicle documents + additional noise 5 queries over noisy data (Generic Units only) 28
Experiments - Metric • Mean Average Precision 29
Conclusions • Hybrid search outperforms keyword and semantic search • HyKSS’s dynamic query weighting approach outperforms various other weighting techniques • Using multiple does not outperform selecting and using a single ontology 33
External Image Citations • Slide 2 Google search screenshot: http://www.google.com (07/30/11) • Slide 3 partial car search form screenshots: http://autotrader.com/fyc (07/30/11) • Slide 4 mountain image: http://en.wikipedia.org/wiki/Lhotse (04/26/11) • Slide 4 car image: http://en.wikipedia.org/wiki/Honda (04/26/11) • Slide 4 roller coaster image: http://en.wikipedia.org/wiki/Kingda_Ka (04/26/11) • Slide 4 Wikipedia logo: http://en.wikipedia.org/wiki/Main_Page (04/26/11) • Slide 4 craigslist logo: http://provo.craigslist.org/ (04/26/11) 34