190 likes | 200 Views
Query Models. CSCI 572: Information Retrieval and Search Engines Summer 2010. Outline. Discovering your data General Approaches Forms-based (fielded) Facet/Guided Navigation Free-text Advanced approaches Clustering/Concept Map Geospatial/Local Search What’s out there.
E N D
Query Models CSCI 572: Information Retrieval and Search Engines Summer 2010
Outline • Discovering your data • General Approaches • Forms-based (fielded) • Facet/Guided Navigation • Free-text • Advanced approaches • Clustering/Concept Map • Geospatial/Local Search • What’s out there
So, you’ve got your data indexed • …what do you do with it? Free-text Facet/Guided
Another example Facets
Yet another example • Fillout formfields and click submit
Generic Query Models • Initially forms-based approaches were extremely common because of their appeal to a particular domain • Usually expose the interesting parts of the data only known by those domain experts • In other words, the users know what they are interested in searching for • Type Specificity • If it’s a date, show a calendar, if it’s a number, limit its input box size, etc.
Generic Query Models • Google popularized free-text search • Though plenty of other companies have offered it since 1990’s • Recall the lecture on Search Engines and the evolution of the web • Type specificity • Free-text search is a harder since it’s difficult to detect parameter types • “cars 2006” • Is the user searching for the movie with the title field “cars 2006”, or is the user looking for cars with the year field (a YYYY formatted integer) set to 2006?
Generic Query Models • Facet-based • Also called “guided navigation”, the model was popularized by eBay, and Yahoo! early on, and then by Google and others later • The data is “bucketed” into groups, which are essentially views into the value space of indexed attributes • Example: index 4 documents, with an “author” field • Doc1, author=Chris • Doc2, author=Sam • Doc3, author=Sam • Doc4, author=Bob author: Chris (1) Sam (2) Bob (1)
Faceted-Navigation example • Usuallycombinedwith free-textelementas well • TypeSpecificity • Implied Selected Facets Further refinement
Hybrid Approaches • Typically, the aforementioned three general query models are combined to form powerful, Hybrid approaches • Guided Navigation/Faceting typically always includes a free-text element • Sometimes it even includes forms-based elements
Hybrid Approaches • All 3 combined, another example
Query languages • Usually specific to a particular model (1:1) • KEV models (keyword=value) • Forms-based • attribute:value AND attribute2:value2… • Logical Operators (AND/OR/NOT, others) • Comparators (>, >=, <=, <, etc.) • Use of “ “ denotes entire phrase rather than tokenized • Use of attribute:[startrange TO endrange] indicates range • Facet-based • In many ways, just refinements and restrictions of KEV type models above
Query languages • Free-text • You are given so little information and have to sense so much richness, so this is where information retrieval techniques come in • IR query models • Must understand Analyzers • Must understand Stopwords, Tokenizers, Lexical analysis, Language analysis (and detection) • Must understand (somehow) underlying field types • Default attributes • Inclusion/Exclusion of terms
IR Example • “cars 2006” • Inclusion of “ “ indicates whole string match • Default field is searchableTxt, which is made up of • page text description • link alt text • page title • … • Type: string field • Tokenization (“ “, “/”, etc.) • Stop-word removal • Eventual query: searchableTxt:”cars 2006” OR page text description:cars OR link alt text:cars OR page title:cars
Advanced Query Model Approaches • Clustering • Facets sensedautomaticallybased on textanalysis • TFIDF to find the mostfrequent terms
Local Search • GIS methods • Point/radius • Bounding box • Polygon • Combine with otherapproaches • Free-text • Facet
GIS search • Overlay “layers” to navigate and search throughinformation space • Typically used with Local approach to deliver search
GIS search challenges • Sometimes the data isn’t annotated with lat and lon • How to discover this? • Even when the data is annotated with spatial information,computation of e.g.,bounding box aroundthe poles is difficult • Efficiency and speed are difficult since data is at scale
Wrapup • Plenty of query models out there to discover your data that you’ve indexed • Typically combined together to form powerful “hybrid” and rich query interfaces • Important to understand underlying data complexity • Type specificity • Structure • Query languages are guided, but not always 1:1 with general query models