90 likes | 204 Views
Modern Information Retrieval. Chapter 4 Query Languages. The type of query the user might formulate is largely dependent on the underlying information retrieval model. Keyword-based querying single-word queries text documents are long sequences of words
E N D
Modern Information Retrieval Chapter 4 Query Languages
The type of query the user might formulate is largely dependent on the underlying information retrieval model
Keyword-based querying • single-word queries • text documents are long sequences of words • ranking of results by term frequency and inverse document frequency • exact positions where the query word appears may need to be output
context queries • to search words near other words • phrase query: a sequence of single-words • enhance retrieval • proximity query: a sequence of single-words with a maximum allowed distance between them • enhance the power of retrieval • the words may or may not be required to appear in the same order as in the query
Boolean queries • e1 BUT e2
Pattern matching • types of patterns • word: computer • prefix of a word: comput • suffix of a word: ter • substring of a word: ute • range formed by two words in lexicographical order: communication and computer
word with an error threshold • edit distance: minimum number of character insertions, deletions, and replacements needed to make the query and the target equal • Computeers • Unit cost edit distance • W(ab)=1, ab (Replacement) • W(a )=W(b)=1 (Deletion and Insertion)
Given any two strings S1=abac, S2=aaccb • Evaluated by dynamic programming method • The edit distance is 3 a b a c c b (1 deletion, 2 insertions) a a c c b
regular expression • a regular expression is a pattern built up by simple strings and the union, concatenation and repetition operators • pro(blem︱tein)(s︱ε)(0︱1︱2)*