1 / 50

Empirical Research Methods in Computer Science

Empirical Research Methods in Computer Science. Lecture 4 November 2, 2005 Noah Smith. Today. Review bootstrap estimate of se (from homework). Review sign and permutation tests for paired samples. Lots of examples of hypothesis tests. Recall.

Download Presentation

Empirical Research Methods in Computer Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Empirical Research Methods in Computer Science Lecture 4 November 2, 2005 Noah Smith

  2. Today • Review bootstrap estimate of se (from homework). • Review sign and permutation tests for paired samples. • Lots of examples of hypothesis tests.

  3. Recall ... • There is a true value of the statistic. But we don’t know it. • We can compute the sample statistic. • We know sample means are normally distrubuted (as n gets big):

  4. But we don’t know anything about the distribution of other sample statistics (medians, correlations, etc.)!

  5. Bootstrap world unknown distribution F empirical distribution observed random sample X bootstrap random sample X* statistic of interest bootstrap replication statistics about the estimate (e.g., standard error)

  6. Bootstrap estimate of se • Run B bootstrap replicates, and compute the statistic each time: θ*[1], θ*[2], θ*[3], ..., θ*[B] (mean of θ* across replications) (sample standard deviation of θ* across replications)

  7. Paired-Sample Design • pairs (xi, yi) • x ~ distribution F • y ~ distribution G • How do F and G differ?

  8. Sign Test • H0: F and G have the same median median(F) – median(G) = 0 • Pr(x > y) = 0.5 • sign(x – y) ~ binomial distribution • compute bin(N+, 0.5)

  9. Sign Test • nonparametric (no assumptions about the data) • closed form (no random sampling)

  10. Example: gzip speed • build gzip with –O2 or with –O0 on about 650 files out of 1000, gzip-O2 was faster binomial distribution, p = 0.5, n = 1000 p < 3 x 10-24

  11. Permutation Test • H0: F = G • Suppose difference in sample means is d. • How likely is this difference (or a greater one) under H0? • For i = 1 to P • Randomly permute each (xi, yi) • Compute difference in sample means

  12. Permutation Test • nonparametric (no assumptions about the data) • randomized test

  13. Example: gzip speed 1000 permutations: difference of sample means under H0 is centered on 0 -1579 is very extreme; p ≈ 0

  14. Comparing speed is tricky! • It is very difficult to control for everything that could affect runtime. • Solution 1: do the best you can. • Solution 2: many runs, and then do ANOVA tests (or their nonparametric equivalents). “Is there more variance between conditions than within conditions?”

  15. Sampling method 1 • for r = 1 to 10 • for each file f • for each program p • time p on f

  16. Result (gzip first) student 2’s program faster than gzip!

  17. Result (student first) student 2’s program is slower than gzip!

  18. Sampling method 1 • for r = 1 to 10 • for each file f • for each program p • time p on f

  19. Order effects • Well-known in psychology. • What the subject does at time t will affect what she does at time t+1.

  20. Sampling method 2 • for r = 1 to 10 • for each program p • for each file f • time p on f

  21. Result gzip wins

  22. Sign and Permutation Tests all distribution pairs (F, G) F  G median(F)  median(G)

  23. Sign and Permutation Tests all distribution pairs (F, G) F  G median(F)  median(G) sign test rejects H0 

  24. Sign and Permutation Tests all distribution pairs (F, G) F  G permutation test rejects H0 median(F)  median(G) 

  25. Sign and Permutation Tests all distribution pairs (F, G) F  G permutation test rejects H0 median(F)  median(G) sign test rejects H0  

  26. There are other tests! • We have chosen two that are • nonparametric • easy to implement • Others include: • Wilcoxon Signed Rank Test • Kruskal-Wallis (nonparametric “ANOVA”)

  27. Pre-increment? • Conventional wisdom: “Better to use ++x than to use x++.” • Really, with a modern compiler?

  28. Two (toy) programs for(i = 0; i < (1 << 30); ++i) j = ++k; for(i = 0; i < (1 << 30); i++) j = k++; • ran each 200 times (interleaved) • mean runtimes were 2.835 and 2.735 • significant well below .05

  29. What? leal -8(%ebp), %eax incl (%eax) movl -8(%ebp), %eax movl -8(%ebp), %eax leal -8(%ebp), %edx incl (%edx) %edx is not used anywhere else

  30. Conclusion • Compile with –O and the assembly code is identical!

  31. Why was this a dumb experiment?

  32. Pre-increment, take 2 • Take gzip source code. • Replace all post-increments with pre-increments, in places where semantics won’t change. • Run on 1000 files, 10 times each. • Compare average runtime by file.

  33. Sign test p = 8.5 x 10-8

  34. Permutation test

  35. Conclusion • Preincrementing is faster! • ... but what about –O? • sign test: p = 0.197 • permutation test: p = 0.672 • Preincrement matters without an optimizing compiler.

  36. Joke.

  37. Your programs ... • 8 students had a working program both weeks. • 6 people changed their code. • 1 person changed nothing. • 1 person changed to –O3. • 3 people lossy in week 1. • Everyone lossy in week 2!

  38. Your programs! • Was there an improvement on compression between the two versions? • H0: No. • Find sampling distribution of difference in means, using permutations.

  39. Student 1 (lossless week 1)

  40. Compression < 1?

  41. Student 2: worse compression

  42. Compression < 1?

  43. Student 3

  44. Student 4 (lossless week 1)

  45. Student 5 (lossless week 1)

  46. Student 6

  47. Student 7

  48. Student 8

  49. Homework Assignment 2 6 experiments: • Does your program compress text or images better? • What about variance of compression? • What about gzip’s compression? • Variance of gzip’s compression? • Was there a change in the compression of your program from week 1 to week 2? • In the runtime?

  50. Remainder of the course • 11/9: EDA • 11/16: Regression and learning • 11/23: Happy Thanksgiving! • 11/30: Statistical debugging • 12/7: Review, Q&A • Saturday 12/17, 2-5pm: Exam

More Related