80 likes | 294 Views
Boolean Model. Boolean Model. A document is represented as a set of keywords. Queries are Boolean expressions of keywords, connected by AND, OR, and NOT, including the use of brackets to indicate scope. [[Rio & Brazil] | [Hilo & Hawaii]] & hotel & !Hilton]
E N D
Boolean Model • A document is represented as a set of keywords. • Queries are Boolean expressions of keywords, connected by AND, OR, and NOT, including the use of brackets to indicate scope. • [[Rio & Brazil] | [Hilo & Hawaii]] & hotel & !Hilton] • Output: Document is relevant or not. No partial matches or ranking.
Boolean Model • Simple model based on set theory; • Queries specified as Boolean expressions: • precise semantics; • neat formalism; • q = ka (kb kc). • Terms are either present or absent. Thus, wij {0,1}; • Consider: • q = ka (kb kc) • vec(qdnf) = (1,1,1) (1,1,0) (1,0,0) • vec(qcc) = (1,1,0) is a conjunctive component.
Ka Kb (1,1,0) (1,0,0) (1,1,1) Kc Boolean Model • q = ka (kb kc) • sim(q,dj) = 1 if vec(qcc) | (vec(qcc) vec(qdnf)) (ki, gi(vec(dj)) = gi(vec(qcc))) 0 otherwise
Boolean Retrieval Model • Popular retrieval model because: • Easy to understand for simple queries. • Clean formalism. • Boolean models can be extended to include ranking. • Reasonably efficient implementations possible for normal queries.
Boolean Models Problems • Retrieval based on binary decision criteria with no notion of partial matching; • No ranking of the documents is provided (absence of a grading scale); • Very rigid: AND means all; OR means any. • Information need has to be translated into a Boolean expression which most users find awkward; • The Boolean queries formulated by the users are most often too simplistic; • It is difficult to express complex user requests.
Boolean Models Problems • As a consequence, Boolean model frequently returns either too few or too many documents in response to a user query. • Difficult to control the number of documents retrieved. • All matched documents will be returned. • Difficult to rank output. • All matched documents logically satisfy the query. • Difficult to perform relevance feedback. • If a document is identified by the user as relevant or irrelevant, how should the query be modified?