170 likes | 306 Views
Don't Compare Averages. WEA 2005 May 10 – May 13, Santorini Island, Greece . Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbr ücken, Germany joint work with Ingmar Weber. Two famous quotes. There are three kinds of lies: lies, damn lies, and statistics
E N D
Don't Compare Averages WEA 2005May 10 – May 13, Santorini Island, Greece Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber
Two famous quotes There are three kinds of lies: lies, damn lies, and statistics Benjamin Disraeli, 1804 – 1881 (reported by Mark Twain) Never believe any statistics you haven‘t forged yourself Winston Churchill, 1874 – 1965
A typical figure Y-axis: some cost measure Each point represents an averageover a number of iterations 4 Theirs Ours 3 X-axis: input size
c 2c 4 15 10 3 Changing the cost measure ... • … by a monotone function, say from c to 2c This is from authentic data!
No deep mathematics here • Even for strict monotone f • certainly E f(X) ≠f(E X) in general • but also E X≤ E Y doesnotin general implyE f(X)≤ E f(Y) • Example • X : 4 , 4→ average 4 • Y : 1 , 5→ average 3 • 2X : 24 , 24→ average 16 • 2Y : 21 , 25→ average 17
Examples of multiple cost measures • Language modeling • for a given probability distribution p1,…, pn • find distribution q1,…, qn from a constrained class that • minimizes cross-entropy Σpi log (pi/qi) • minimizes perplexity π(pi/qi)pi= 2cross-entropy • Algorithm A uses algorithm B as a subroutine • B produces result of average quality q • complexity of A depends on, say, q2
Can this also happen with error bars? • error bars for c don't overlap, yet reversal for f(c)? f(c) c Yes, this can also happen!
Can this also happen with error bars? • complete reversal with error bars? f(c) c
Can this also happen with error bars? • complete reversal with error bars? f(c) c
Can this also happen with error bars? • complete reversal with error bars? f(c) c E f(Y) – δ f(Y) E f(X) + δ f(X) E X – δ X E Y + δ Y δ Z = E |Z – E Z| absolute deviation ≤ σ Z = sqrt E (Z – E Z)2 standard deviation
Can this also happen with error bars? • complete reversal with error bars? f(c) c then E f(X) – δ f(X) ≥E f(Y) + δ f(Y) if E X – δ X ≥E Y + δ Y Theorem: complete reversal can never happen!
Can this also happen with error bars? • complete reversal with error bars? f(c) c then E f(X) – δ f(X) ≥E f(Y) + δ f(Y) if E X – δ X ≥E Y + δ Y if only one of the four δ is dropped, the theorem no longer holds in general
The canonical proof • The medians M X and M Y do commute with f … • Prob(X ≤ M X) = ½ = Prob( f(X) ≤ f(M X) ) • f(M X) = M f(X) and f(M Y) = M f(Y) • … and hence cannot reverse their order • M X ≤M Y → f(M X) ≤ f(M Y) becausefis monotone→ M f(X) ≤ M f(Y) because M and f commute • Expectation and median are related as • |E X – M X| ≤ δ X = E|X – E X| • |E Y – M Y| ≤ δ Y = E|Y – E Y| nothing new, but hardly any computer scientist seems to know
The canonical proof • now assume this would happen f(c) c E f(Y) – δ f(Y) E f(X) + δ f(X) E X – δ X E Y + δ Y then M Y≤M X yet M f(Y)>M f(X) contradicts the fact that the medians cannot reverse
Y X Conclusion • Average comparison is a deceptive thing • even with error bars! • There are more effects of this kind … • e.g. non-overlapping error barsare not statistically significantfor a particular order of theexpectations (or medians) • e.g. for normally distributed X, Y Prob( X + δ X ≤ Y – δ Y | E X > E Y ) is up to 8% Better always look at the complete histogram and at least check maximum and minimum
Conclusion • Average comparison is a deceptive thing • even with error bars! • There are more effects of this kind … • e.g. non-overlapping error barsare not statistically significantfor a particular order of theexpectations (or medians) • e.g. for normally distributed X, Y Prob( X + δ X ≤ Y – δ Y | E X > E Y ) is up to 8% Ευχαριστώ!