110 likes | 256 Views
Collection Fusion. -Parallel Retrieval on Different Information Sources (e.g. Different Search Engines, or different collections) -Merger of results. MetaCrawler - University of Washington (Selberg + Etzioni,1995) Towell + Vorhees. Collection Fusion.
E N D
Collection Fusion -Parallel Retrieval on Different Information Sources (e.g. Different Search Engines, or different collections) -Merger of results • MetaCrawler - University of Washington (Selberg + Etzioni,1995) • Towell + Vorhees
Collection Fusion Merging results from different Search Engines (Web brokers) Lycos Altavista Infoseek Excite Joe’s Bot 1 2 3 4 5 6 7 8 9 10 11 12 .99 .98 .96 .94 .94 .92 .92 4 4 4 3.5 3.2 3.0 2.1 .99 .97 .97 .95 .95 .92 Rank -Different Methods (Good Thing) -Merge by downloading all and rerank using private relevance scheme Bayes Nets Bag of words
Collection Fusion • Issues: • Different weighting and relevance scales (logarithmic, linear, different ranges…) • No ranking or weighting in some cases • Different sizes of response set • Different biases of collections • Duplicate identification and removal • Cost (money) or latency/bandwidth (time) as factor in “relevance” ranking
Goal: • Learn: • Ranking scale • Ranking Reliability • Relevance Ratio • Function: • Rank(CF): a1f(Rank(A1)) + a2f(Rank(A2)) + … ai = 1/k, with k=number of collections May need log transfer and/or scale shift
Issues -Duplicate Identification and Removal -Link Checking (Reliability)
Impact on Service Provider -Charge Per Access -Advertising Solutions?
Rank-Driven Collection Fusion Rank(CF,di) = S aj f(Rank(collectionj, di)) j Î collections May need log transform or scale shift Rank of document i in collection j Will depend on collection’s overall relevance and reliability of rankings
Collection 1 CF Collection 2 Assuming: -Relevance µ rank -Collection sizes are equal -Smaller returned set More selective
Collection 1 CF Collection 2 Assuming: -Relevance µ -Equal selectivity -Smaller returned set Smaller collection rank Total Returned
lexcite = .01 = f(correlation with my judgements) lAltavista = .50 Merge using Relevance judgments of different search engines ) My rank or relevance f( service provider Their rankings + Rel judgments Their past performance = - nature of scale used
Sample-Based Relevance sample Collection 1 .99 .96 .96 .95 .95 .94 .94 .93 .92 .91 1.00 100 ideal System 1 System Ranking User Ranking System 2 User (Collective Judgment) System 3 0 System 4 Collection 2 .99 .96 .96 .95 .95 .94 .94 .93 .92 .91 100 0.0 0.0 1.00 System 0