1 / 19

Rank Aggregation Methods II Experiments

This lecture discusses experiments on distance measures in the rank aggregation problem, including Spearman footrule distance, Kendall tau distance, induced footrule distance, and scaled footrule distance.

hollyn
Download Presentation

Rank Aggregation Methods II Experiments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rank Aggregation Methods IIExperiments CS728 Lecture 12

  2. Recall the Rank Aggregation Problem • m candidates (a.k.a. “alternatives”) • M = {1,…,m}: set of candidates • n voters (a.k.a. “agents” or “judges”) • N = {1,…,n}: set of voters • Each voter i, has an ranking i on M • i(a) < i(b) means i-th voter prefers a to b • Ranking may be a total or partial order • The rank aggregation problem: Combine 1,…,n into a single ranking  on M, which represents the “social choice” of the voters. • Rank aggregation function: f(1,…,n) =  •  may be a total or partial order

  3. Experiments: Distance Measures Goal: Quantitatively compare different rank aggregation methods. Performance Measures: (1) Spearman footrule distance is sum of pointwise distances. It is normalized by dividing this number by the maximum value (1/2)|S|2, value between 0 and 1. (2) Kendall tau distance counts the number of pairwise disagreements. Dividing by the maximum possible value (1/2)S(S - 1) we obtain a normalized version, value between 0 and 1. (3) The induced footrule distance is obtained by taking the projections of a full list s with each partial list. In a similar manner, induced Kendall tau distance can be defined. (4) The scaled footrule distance weights contributions of elements based on the length of the lists they are present in. If s is a full list and t is a partial list, then: SF(s, t) = Sum | s(i)/|s|) - (t(i)/|t|) |. Normalize SF by dividing by |t|/2.

  4. Experiments: Distance Measures • So for each aggregation method and each distance measure we get a vector of values, each component representing a distance to from the aggregation to each voter list • Simplest is to take the average (or 1-norm) • Other norms are interesting • Mean square distance (2-norm) • Max distance (∞-norm)

  5. Experiments: Minimizing AverageAltavista (AV), Alltheweb (AW), Excite (EX), Google (GG), Hotbot HB),Lycos (LY), and Northernlight (NL) K = Kendall distance SF = scaled footrule distance IF = induced footrule distance LK = Local Kemenization

  6. Experiments in Spam Filtering • Define spam to be web pages are low-ranked by majority opinion (machine and human – a simplifying assumption) – although they may be highly ranked by some search engines • Intuition: if a page spams most search engines for a particular query, then no combination of these search engines can filter the spam.---garbage in, garbage out. • Spam pages are the Condorcet losers, and will occupy the bottom of ranking that satisfies the extended Condorcet criterion • Similarly, good pages will be in the Condorcet winners, and will rank above the losers.

  7. Condorcet Criteria • Condorcet Criterion • An candidate of M which wins every other in pairwise simple majority voting should be ranked first. • Extended Condorcet Criterion (XCC): • Version 1: If most voters prefer candidate a to candidate b (i.e., # of i s.t. i(a) < i(b) is at least n/2), then also  should prefer a to b (i.e., (a) < (b)). • Version 2: If there is a partition (W, L) of M such that for any x in W and y in L the majority prefers x to y, then x must be ranked above y. W is called Condorcet winners and L is Condorcet losers

  8. XCC(2) and SPAM Filtering • Note that XCC(1) => XCC(2), so Version 1 is stronger • But XCC(1) is not always realizable • As we will see XCC(2) is always realizable via Local Keminization • Hence using rank aggregation with XCC(2) should assist in SPAM filtering, since Condorcet losers will be lowest rank • Let us look at where spam pages (human determined) are ranked with good aggregation methods.

  9. Experiments: Filtering SPAM

  10. Experiment: Word association • Different search engines and portals have different (default) semantics of handling a multi-word query. • Some use OR semantics (documents contain one of the given query terms) while Google uses the AND semantics (all the query words must appear). Both inconvenient in many situations. • Consider searching for the job of a software engineer from an on-line job database. The user lists a number of skills and a number of potential keywords in the job description, for example, "Silicon Valley C++ Java CORBA TCP-IP algorithms start-up pre-IPO stock options". It is clear that the "AND" rule might produce no document or SPAM, and the "OR" rule is equally disastrous. • Experiment with rank aggregation using multiple queries based on small subsets of terms.

  11. Results for query: madras madurai coimbatore vellore.  (cities in the state of Tamil Nadu, India)    • Google www.mssrf.org/Fris9809/location-tamilnadu.htmlwww.indiaplus.com/Info/schools.htmlwww.focustamilnadu.com/tamilnadu/Policy%20Note ...Forests.htmlwww.tn.gov.in/policy/environ.htmwww.indiacolleges.com/Tamil_Nadu.htm • SFO with LK www.madurai.comwww.ozemail.com.au/clday/locations.htmwww.utoledo.edu/homepages/speelam/coimbatore.htmlwww.ozemail.com.au/clday/madras.htmwww.madurai.com/around.htmwww.indiatraveltimes.com/tamilnadu/tamil1.html • MC4 with LK www.madurai.comwww.surfindia.com/omsakthi/tourism.htmwww.indiatraveltimes.com/tamilnadu/tamil1.htmlwww.indiatraveltimes.com/tamilnadu/tamil2.htmlwww.indiatravels.com/forts/vellore_fort.htmwww.india-tourism.de/english/south/tamil_nadu.html

  12. Locally Kemeny optimal aggregation and XCC(2) • Many of existing aggregation methods do not satisfy XCC(1) or XCC(2). • It is possible to use your favorite aggregation method to obtain a full list. Then apply local kemenization to realize XCC(2) which filters Condorcet losers.

  13. Locally Kemeny optimal • Recall that Kemeny optimal is NP-hard • Definition of locally optimalA permutation p is a locally Kemeny optimal aggregation of partial lists t1, t2, ..., tk, if there is no permutation p' that can be obtained from p by performing a single transposition of an adjacent pair of elements and for which  Kendal distance K(p', t1, t2, ..., tk) < K(p, t1, t2, ..., tk). In other words, it is impossible to reduce the total distance to the t's by flipping an adjacent pair.

  14. Example of LKO but not KO • Example 1 • t1 = (1,2), t2 = (2,3), t3 = t4 = t5 = (3,1). • p = (1,2,3), We have that p satisfies Definition of LKO, K(p, t1, t2, ..., t5)= 3, but transposing 1 and 3 decreases the sum to 2.

  15. LKO satisfies XCC(2) • Proof by contradictionIf the result is false then there exist partial lists t1, t2, ..., tk, a LKO aggregation p, and a partition (W,L) that violates XCC(2); that is some pair c in W and d in L, such that p(d) < p(c). Let (c,d) be the closest such pair in p. • Consider the immediate successor of d in p, call it e. If e=c then c is adjacent to d in p and transposing this adjacent pair of alternatives produces a p' such that K(p', t1, t2, ..., tk) < K(p, t1, t2, ..., tk), contradicting the assumption on p. • If e does not equal c, then either e is in W, in which case the pair (e,d) is a closer pair in p than (d,c) and also violates the XCC(2), or e is in L, in which case (e,c) is a closer pair than (d,c) that violates XCC(2). Both cases contradict the choice of (d,c).

  16. Local Kemenization procedure • A local Kemenization of a full list with respect to preference lists so as to compute a locally Kemeny optimal aggregationthat is maximally consistent with original. This approach: (1) preserves the strengths of the initial aggregation (2) ranks non-spam above spam. (3) gives a result that disagrees with original on any pair (i, j) only if a majority endorse this disagreement. (4) for every d, 1 ≤ d ≤ | μ |, the restriction of the output is a local Kemenization of the top d elements of μ

  17. Local Kemenization procedure • A simple inductive construction. • Assume inductively for that we have constructed p, a local Kemenization of the projection of the t's onto the elements 1, ..., l-1. • Insert next element x into the lowest-ranked "permissible" position in p: just below the lowest-ranked element y in p such that • (a) no majority among the (original) t's prefers x to y and • (b) for all successors z of y in p there is a majority that prefers x to z. • In other words, we try to insert x at the end (bottom) of the list p; we bubble it up toward the top of the list as long as a majority of the t's insists that we do.

  18. Example local kemenization procedure • Local Kemenization Example! A B F E C D B C A E F D A C F D E B B F D C A E C A B F E D B A DC E F A B D B A B A B CF E D A B DC A B CD B A disagree A>B: 3 A<B: 2 B>D: 4 B<D: 1

  19. RA and Searching Workplace Web • Axiom 1: Intranet documents are not spam • Axiom 2: Queries usually have unique answers (not broad topic based) • Axiom 3: Intranet docs are not search engine friendly (docs are accessed through portals and database queries • Rank aggregation allows us to combine number of heuristic alternatives: static and dynamic, query dependent and independent

More Related