200 likes | 324 Views
On a cognitive search strategy. Per Ahlgren. Overview. Background Ingwersen’s example A cognitive search strategy Search formulation construction Four situations but two search formulations A remedy Concluding remarks. Background.
E N D
On a cognitive search strategy Per Ahlgren
Overview • Background • Ingwersen’s example • A cognitive search strategy • Search formulation construction • Four situations but two search formulations • A remedy • Concluding remarks
Background • Important to provide users of the digital library with tools that help them to retrieve information relevant to their needs • Search stratgies - approaches for search problems • Example. The building blocks strategy
Ingwersen’s example • Two terms, A and B • Two fields, title field (TI) and descriptor/identifier field (DE,ID) • Four atomic search formulations: • A/TI; A/DE,ID; B/TI; B/DE,ID • Assumed situation with regards to document frequency (df): df (A/TI) < df (B/TI) < df (A/DE,ID) < df (B/DE,ID).
Principle (P) - atomic search formulations with lower frequencies should be combined before formulations with higher frequencies • Idea behind P: a term’s value for retrieval purposes is inversely proportional to the number of documents in which the term occurs
A cognitive search strategy • Different cognitive agents - author (TI) and indexer (DE,ID) - are involved with regards to the assignment of terms to documents • Occurrence of different cognitiveagents • Cognitive overlap • When constructing Boolean search formulations in a two term TI/DE,ID search, consider the factors occurrence of different cognitive agents and cognitive overlap when combining atomic formulations with the AND operator
Optimal situation: both terms are present in both fields (the cognitive agents involved agree about the two access points). Expressed by the following formulation: A/TI*B/TI*A/DE,ID*B/DE,ID. • A multiple evidence approach - the strategy combines evidence for the relevance of a document
Search formulation construction • Purpose: stepwise retrieval of a number of subsets of the set D of documents that is retrieved by A+B. First formulation: S1 A/TI*B/TI*A/DE,ID*B/DE,ID. • Two methods • (1) NOTPRESET (Ingwersen’s method) • A new formulation is obtained by (1) combining atomic formulations by the AND operator, considering the factors (a) presence of A and B, (b) occurrence of different cognitive agents and (c) document frequency, (2) excluding all the preceeding formulations by the NOT and OR operators, and (3) ANDing the results of (1) and (2).
Should be fairly easy for the user to grasp • Example. S2 A/TI*B/TI*A/DE,ID NOT S1 • (2) NOTATOMIC • A new formulation is obtained by (1) combining as many atomic formulations as possible (in the light of earlier formulations) by the AND operator, considering the factors (a) presence of A and B, (b) occurrence of different cognitive agents and (c) document frequency, (2) excluding by the NOT and OR operators all the atomic formulations that are not part of the result of (1), and (3) ANDing the results of (1) and (2). • Yields, in most cases, shorter fomulations than NOTPRESET
Should be fairly easy for the user to grasp • Is abandoned in step 10 and step 11 • Example. (2) A/TI*B/TI*A/DE,ID NOT B/DE,ID • Presence of A and B the most importent factor • Occurrence of different cognitive agents more important than document frequency
Four situations but two search formulations • Consider the (NOTATOMIC) formulations (10) A/TI NOT (B/TI+B/DE,ID) and (11) B/TI NOT (A/TI+A/DE,ID). • (10) and (11) are indefinite with respect to A/DE,ID and B/DE,ID, respectively.
(1)A is present in TI but not in DE,ID, and B is absent from both fields and (2) A is present in both fields, and B is absent from both fields, or between (3) B is present in TI but not in DE,ID, and A is absent from both fields and (4) B is present in both fields, and A is absent from both fields.
We then need four formulations instead of (10) A/TI NOT (B/TI+B/DE,ID) and (11) B/TI NOT (A/TI+A/DE,ID) (instead of S10 and S11), four formulations that express the four situations. • Ingwersen’s formulations express only 11 of the 16 possible situations with regards to the presence of A and B in the two fields.
Figure 1: The 16 possible situations with regards to the presence of of A and B in the two fields.
A remedy • NOTATOMIC Subtitute the following four formulations (in the given order) 9a A/TI*A/DE,ID NOT (B/TI+B/DE,ID) 9b B/TI*B/DE,ID NOT (A/TI+A/DE,ID) 10* A/TI NOT (A/DE,ID+B/TI+B/DE,ID) 11* B/TI NOT (B/DE,ID+A/TI+A/DE,ID) for (10) A/TI NOT (B/TI+B/DE,ID) and (11) B/TI NOT (A/TI+A/DE,ID).
The new set of formulations express 15 of the 16 possible cases with regards to the presence of A and B in the two fields, not just 11. • NOTATOMIC is not abandoned.
NOTPRESET • It is also possible to use a special case of NOTPRESET, say NOTPRESET*, to construct formulations that express the four situations in question. When constructing a new formulation, the first step in NOTPRESET* is identical with the first step in NOTATOMIC: combine as many atomic formulations as possible (in the light of earlier formulations) by the AND operator, considering the factors presence of A and B,occurrence of different cognitive agents and document frequency.
The new set of formulations express 15 of the 16 possible cases with regards to the presence of A and B in the two fields, not just 11.
Concluding remarks • Ingwersen’s set of formulations should be modified to correspond to 15 of the 16 possible situations with regards to the presence of the terms A and B in the two fields. • Ingwersen’s approach gives the Boolean searcher a hint concerning the order in which the parts of a (possibly large) document set should be retrieved.
If the command language does not admit abbreviation of an OR formlation, NOTATOMIC is in my opinon preferable to NOTPRESET*.