1 / 17

Don't Compare Averages

Don't Compare Averages. WEA 2005 May 10 – May 13, Santorini Island, Greece . Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbr ücken, Germany joint work with Ingmar Weber. Two famous quotes. There are three kinds of lies: lies, damn lies, and statistics

crwys
Download Presentation

Don't Compare Averages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Don't Compare Averages WEA 2005May 10 – May 13, Santorini Island, Greece Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber

  2. Two famous quotes There are three kinds of lies: lies, damn lies, and statistics Benjamin Disraeli, 1804 – 1881 (reported by Mark Twain) Never believe any statistics you haven‘t forged yourself Winston Churchill, 1874 – 1965

  3. A typical figure Y-axis: some cost measure Each point represents an averageover a number of iterations 4 Theirs Ours 3 X-axis: input size

  4. c 2c 4 15 10 3 Changing the cost measure ... • … by a monotone function, say from c to 2c This is from authentic data!

  5. No deep mathematics here • Even for strict monotone f • certainly E f(X) ≠f(E X) in general • but also E X≤ E Y doesnotin general implyE f(X)≤ E f(Y) • Example • X : 4 , 4→ average 4 • Y : 1 , 5→ average 3 • 2X : 24 , 24→ average 16 • 2Y : 21 , 25→ average 17

  6. Examples of multiple cost measures • Language modeling • for a given probability distribution p1,…, pn • find distribution q1,…, qn from a constrained class that • minimizes cross-entropy Σpi log (pi/qi) • minimizes perplexity π(pi/qi)pi= 2cross-entropy • Algorithm A uses algorithm B as a subroutine • B produces result of average quality q • complexity of A depends on, say, q2

  7. Can this also happen with error bars? • error bars for c don't overlap, yet reversal for f(c)? f(c) c Yes, this can also happen!

  8. Can this also happen with error bars? • complete reversal with error bars? f(c) c

  9. Can this also happen with error bars? • complete reversal with error bars? f(c) c

  10. Can this also happen with error bars? • complete reversal with error bars? f(c) c E f(Y) – δ f(Y) E f(X) + δ f(X) E X – δ X E Y + δ Y δ Z = E |Z – E Z| absolute deviation ≤ σ Z = sqrt E (Z – E Z)2 standard deviation

  11. Can this also happen with error bars? • complete reversal with error bars? f(c) c then E f(X) – δ f(X) ≥E f(Y) + δ f(Y) if E X – δ X ≥E Y + δ Y Theorem: complete reversal can never happen!

  12. Can this also happen with error bars? • complete reversal with error bars? f(c) c then E f(X) – δ f(X) ≥E f(Y) + δ f(Y) if E X – δ X ≥E Y + δ Y if only one of the four δ is dropped, the theorem no longer holds in general

  13. Our first proof

  14. The canonical proof • The medians M X and M Y do commute with f … • Prob(X ≤ M X) = ½ = Prob( f(X) ≤ f(M X) ) • f(M X) = M f(X) and f(M Y) = M f(Y) • … and hence cannot reverse their order • M X ≤M Y → f(M X) ≤ f(M Y) becausefis monotone→ M f(X) ≤ M f(Y) because M and f commute • Expectation and median are related as • |E X – M X| ≤ δ X = E|X – E X| • |E Y – M Y| ≤ δ Y = E|Y – E Y| nothing new, but hardly any computer scientist seems to know

  15. The canonical proof • now assume this would happen f(c) c E f(Y) – δ f(Y) E f(X) + δ f(X) E X – δ X E Y + δ Y then M Y≤M X yet M f(Y)>M f(X) contradicts the fact that the medians cannot reverse

  16. Y X Conclusion • Average comparison is a deceptive thing • even with error bars! • There are more effects of this kind … • e.g. non-overlapping error barsare not statistically significantfor a particular order of theexpectations (or medians) • e.g. for normally distributed X, Y Prob( X + δ X ≤ Y – δ Y | E X > E Y ) is up to 8% Better always look at the complete histogram and at least check maximum and minimum

  17. Conclusion • Average comparison is a deceptive thing • even with error bars! • There are more effects of this kind … • e.g. non-overlapping error barsare not statistically significantfor a particular order of theexpectations (or medians) • e.g. for normally distributed X, Y Prob( X + δ X ≤ Y – δ Y | E X > E Y ) is up to 8% Ευχαριστώ!

More Related