120 likes | 202 Views
Mutual Information and Choice of AND and OR. Dayu 18 Nov 2005. An Example. Query No. 605 Great Britain health care We choose it because it consists of 4 terms Performance in MAP. Using Two terms.
E N D
Mutual Information and Choice of AND and OR Dayu 18 Nov 2005
An Example • Query No.605 Great Britain health care • We choose it because it consists of 4 terms • Performance in MAP
Using Two terms • Based on the performance (in MAP) of “AND” and “OR” two terms, we guess the manner that these two terms affect relevance • Great Britain health care G B H C
What does “Yes” mean? • If “Yes” (i.e. MAPAND> MAPOR), it means that these two terms can complement or disambiguate each other to make more relevant information. • Denoted by term1-term2 • If “No” (i.e. MAPAND< MAPOR), it means that these two terms • (1) seldom co-occur or • (2) more or less synonyms • Denoted by (term1,term2) • If MAPAND≈ MAPOR, it means that these two terms always co-occur
Overall Relationships In conclusion, relationships of each pair of the four terms are consistent. It’s (G,B)-(H,C)
Advanced Boolean Operation • (G,B)-(H,C) • Could we use (G or B) and (H or C)? • Performance MAP=0.0762 • Compared with:
A Method to estimate the relationship using MI • By mutual information. • MI=P(A,B)/P(A)P(B) • P(A,B)= # of IUs contains both A and B / total # of IUs • P(A)= # of IUs contains A / total # of IUs • P(B)= # of IUs contains A / total # of IUs Hypothesis: The MI is bigger, we have more confidence to use OR
Relationship between MI and (MAPor-MAPand)/min(MAPand,MAPor)
Social b 0.78 0.78 Tax 1.07 a 1.07 Securities 0.79 c 0.79 Variance of MI = 0.019
Query: SDI Star Wars b 0. 8 a Variance of MI = 0.076 1.1 c 0.4
Query: college education advantage b 0. 56 Variance of MI = 0.017 a 0.41 c 0.23
Future Work • Investigate on more widespread queries. • Does the variance of MI between each pair affect to use AND or OR? • Should we additionally bring MI of two terms into the computation of allo-T edge?