1 / 11

Advanced Information Retreival

Explore how the extended Boolean model introduces ranking using partial matching, term weighting, and combines Vector model features with Boolean algebra properties. Generalize the model to consider Euclidean distances in t-dimensional space through p-norms.

tgatlin
Download Presentation

Advanced Information Retreival

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Information Retreival Chapter. 02: Modeling (Extended Boolean Model)

  2. Extended Boolean Model • Boolean model is simple and elegant. • But, no provision for a ranking • As with the fuzzy model, a ranking can be obtained by relaxing the condition on set membership • Extend the Boolean model with the notions of partial matching and term weighting • Combine characteristics of the Vector model with properties of Boolean algebra

  3. The Idea • The extended Boolean model (introduced by Salton, Fox, and Wu, 1983) is based on a critique of a basic assumption in Boolean algebra • Let, • q = kx  ky • wxj = fxj * idf(x) associated with [kx,dj] max(idf(i)) • Further, wxj = x and wyj = y

  4. The Idea: • qand = kx  ky; wxj = x and wyj = y AND 2 2 sim(qand,dj) = 1 - sqrt( (1-x) + (1-y) ) 2 (1,1) ky dj+1 y = wyj dj (0,0) x = wxj kx

  5. The Idea: • qor = kx  ky; wxj = x and wyj = y OR 2 2 sim(qor,dj) = sqrt( x + y ) 2 (1,1) ky dj+1 dj y = wyj (0,0) x = wxj kx

  6. Generalizing the Idea • We can extend the previous model to consider Euclidean distances in a t-dimensional space • This can be done using p-norms which extend the notion of distance to include p-distances, where 1  p  is a new parameter • A generalized conjunctive query is given by • qor = k1 k2 . . . kt • A generalized disjunctive query is given by • qand = k1 k2 . . . kt p p p    p p p   

  7. Generalizing the Idea 1 p p p p • sim(qor,dj) = (x1 + x2 + . . . + xm ) m 1 p • sim(qand,dj) = 1 - ((1-x1) + (1-x2) + . . . + (1-xm) ) m p p p

  8. Properties 1 p p p p • sim(qor,dj) = (x1 + x2 + . . . + xm ) m • If p = 1 then (Vector like) • sim(qor,dj) = sim(qand,dj) = x1 + . . . +xm m • If p =  then (Fuzzy like) • sim(qor,dj) = max (wxj) • sim(qand,dj) = min (wxj)

  9. Properties • By varying p, we can make the model behave as a vector, as a fuzzy, or as an intermediary model • This is quite powerful and is a good argument in favor of the extended Boolean model • (k1 k2) k3 • k1 and k2 are to be used as in a vectorial retrieval while the presence of k3 is required.  2  

  10. Properties   • q = (k1 k2) k3 • sim(q,dj) = ( (1 - ( (1-x1) + (1-x2) ) ) + x3 ) 2__________________ 2 2  1 1 p p p p p

  11. Conclusions • Model is quite powerful • Properties are interesting and might be useful • Computation is somewhat complex • However, distributivity operation does not hold for ranking computation: • q1 = (k1  k2)  k3 • q2 = (k1  k3)  (k2  k3) • sim(q1,dj)  sim(q2,dj)

More Related