190 likes | 330 Views
Evaluating Different Methods of Estimating Retrieval Quality for Resource Selection. H. Nottelmann, N. Fuhr Presented by Tao Tao February 12, 2004. Engine 1. Engine 3. Engine 4. Engine 2. Fusion. query. Distributed IR. Resource selection. Heuristic ways GIOSS CORI others
E N D
Evaluating Different Methods of Estimating Retrieval Quality for Resource Selection H. Nottelmann, N. Fuhr Presented by Tao Tao February 12, 2004
Engine 1 Engine 3 Engine 4 Engine 2 Fusion query Distributed IR Resource selection
Heuristic ways GIOSS CORI others Good performance, but poor theoretical foundation Decision-theoretic framework (DTF) Solid foundation, good performance as well Resources selection
Cost i Engine i Cost -- every factor is a type of cost • Effectiveness: • C+ to view a relevant doc; • C- for non-relevant one • Time: Ct • Money: Cm • others
Server i Rank 1 Rank 2 Rank 3 Rank 4 …… Rank si …… Rank n … Cim Cit Cirel cut ? Problems need addressing
Three methods to estimate ri(si,q) • DTF-rp • DTF-sample • DTF-normal
DTF-rp • Assume: Pi=Pi0(1-Ri) (precision-recall curve) P P0 R
DTF-normal • Four steps • Modeling the distribution of Pr(t←d) • Computing Pr(q←d) • Deriving Pr(rel|q,d) • Estimating r(si,q)
Experiments • DTF-rp • DTF-sample • DTF-normal • CORI-all: ALL done by CORI • CORI-rs: RS done by CORI, but fusion by DTF
Fixed number of selected DLs • DTF-rp and DTF-normal are very close • They performances similarly with CORI-all on mid-length query, but worse on short and long queries.
Sensitivity to query length • Short queries: stable • Mid-length queries: good only for learning from mid-length queries • Long queries: short query leaning performance worst, mid-length is best
Additional cost • DTF-rp and DRT-normal performs closely • Better than DTF-sample and two CORI’s
Conclusions • Good theoretic foundation • Dynamically selection #libraries, and #docs • Performs competitively to CORI’s • Can include other types of cost
Problems? • The authors claim this can solve redundancy problem by the same way. • But I think it CANNOT. Why? • The estimation doesn’t work if not selecting documents from continues places.