Collection Fusion

Collection Fusion -Parallel Retrieval on Different Information Sources (e.g. Different Search Engines, or different collections) -Merger of results • MetaCrawler - University of Washington (Selberg + Etzioni,1995) • Towell + Vorhees

Collection Fusion Merging results from different Search Engines (Web brokers) Lycos Altavista Infoseek Excite Joe’s Bot 1 2 3 4 5 6 7 8 9 10 11 12 .99 .98 .96 .94 .94 .92 .92 4 4 4 3.5 3.2 3.0 2.1 .99 .97 .97 .95 .95 .92 Rank -Different Methods (Good Thing) -Merge by downloading all and rerank using private relevance scheme Bayes Nets Bag of words

Collection Fusion • Issues: • Different weighting and relevance scales (logarithmic, linear, different ranges…) • No ranking or weighting in some cases • Different sizes of response set • Different biases of collections • Duplicate identification and removal • Cost (money) or latency/bandwidth (time) as factor in “relevance” ranking

Goal: • Learn: • Ranking scale • Ranking Reliability • Relevance Ratio • Function: • Rank(CF): a1f(Rank(A1)) + a2f(Rank(A2)) + … ai = 1/k, with k=number of collections May need log transfer and/or scale shift

Issues -Duplicate Identification and Removal -Link Checking (Reliability)

Impact on Service Provider -Charge Per Access -Advertising Solutions?

Rank-Driven Collection Fusion Rank(CF,di) = S aj f(Rank(collectionj, di)) j Î collections May need log transform or scale shift Rank of document i in collection j Will depend on collection’s overall relevance and reliability of rankings

Collection 1 CF Collection 2 Assuming: -Relevance µ rank -Collection sizes are equal -Smaller returned set More selective

Collection 1 CF Collection 2 Assuming: -Relevance µ -Equal selectivity -Smaller returned set Smaller collection rank Total Returned

lexcite = .01 = f(correlation with my judgements) lAltavista = .50 Merge using Relevance judgments of different search engines ) My rank or relevance f( service provider Their rankings + Rel judgments Their past performance = - nature of scale used

Sample-Based Relevance sample Collection 1 .99 .96 .96 .95 .95 .94 .94 .93 .92 .91 1.00 100 ideal System 1 System Ranking User Ranking System 2 User (Collective Judgment) System 3 0 System 4 Collection 2 .99 .96 .96 .95 .95 .94 .94 .93 .92 .91 100 0.0 0.0 1.00 System 0

Collection Fusion

Collection Fusion

Presentation Transcript

Fusion

CHROMOSOME FUSION?

Fusion

Data Fusion

Nuclear Fusion

Fusion

fusion

Fusion

Fusion Imaging

Fusion

FUSION EDUCATION

fusion

FUSION

Information Fusion

Fusion

Fusion

Nuclear Fusion D-T Fusion Reactions

Fusion-Incomplete Fusion

Fusion-Incomplete Fusion

Fusion collection in stylish Tunics with Bottom-Wear