200 likes | 413 Views
Strong Inference J. Pratt. Progress in science advances by excluding among alternate hypotheses. Experiments should be designed to disprove a hypothesis. A hypothesis which is not subject to being falsified doesn’t lead anywhere meaningful
E N D
Strong InferenceJ. Pratt • Progress in science advances by excluding among alternate hypotheses. • Experiments should be designed to disprove a hypothesis. • A hypothesis which is not subject to being falsified doesn’t lead anywhere meaningful • Any conclusion which is not an exclusion is insecure
Alt1a Alt1b Logical Tree Problem • Our conclusion X might be invalid if alternative hypothesis 1, alternative hypothesis 2, … alternative hypothesis n • We describe experiments to eliminate alternatives. • We proceed along the branches not eliminated. … Alt n Alt 1
Multiple Hypotheses • One can become emotionally “attached” to a single hypothesis • Temptation to demonstrate it is right, make facts fit the theory. • Multiple working hypotheses turns research into a competition among ideas rather than among personal agendas • Gets at the issue of bias
“Support Activities” in Science • Surveys and taxonomy • Experimental infrastructure development • Measurements and tables (e.g. file system usage studies) • Theoretical/abstract models Useful, provided they contribute to chain of discovery but not as ends in themselves.
Applying Strong Inference to Computer Systems Research This has not been our culture • “Mine is better than theirs” and experiments that show this affirmatively (not honestly attempted to show otherwise) • Non-hypotheses – statements that really can’t be shown to be false.“This system does what it was designed to do” (by definition). • Negative results are hard-sells to publish Issue is scientific effectiveness.
A Good Example Wolman et al, On the scale and performance of cooperative web proxy caching, SOSP 99 Question: Should multiple proxies cooperate in order to increase client populations, improve hit ratios, and reduce latency?
Logical tree Coop web caching works Decreaseobjectlatency, ideal case Increasehit ratio,ideal case … Increasehit ratio,real case
Experiments • Web traces at UW and Microsoft • Simulation: • Infinite cache size (no capacity misses) • Single proxy (sees all information, no overhead) • 2 cases • Ideal caching – all documents in spite of “cachability” • Respecting cacheability • Upper bound on performance
Beyond the knee, no significant improvement Singleproxyenough here
Discussion Next Time:Exercise in Strong Inference • Pick one paper that seems like the most important scientific advance and recast its experimental evaluation in terms of hypotheses and experiments to exclude (as a logical tree). • Read Jain chapters 4 and 5 for next week.
Metrics • Metrics are criteria to compare performance • Quantifiable • Measurable • Relevant to goals • Complete set would reflect all possible outcomes
Not Metrics • Intuitive goals – “I know it when I see it” (like great art or the “right” behavior) – e.g. fairness • Categories of metrics – e.g. performance. There are many precisely defined performance metrics. • Analysis methods – cumulative distribution function (CDF) – ask: of what data? • Presentation approach – piechart – again ask: of what data?
Finish Discussion: Sampling of Metrics from Literature* *Send me links to your ppt so I can digest the material and put it on the lecture web page.