240 likes | 401 Views
Evaluation. Eyal Ophir CS 376 4/28/09. Readings. Methodology Matters (McGrath, 1994) Practical Guide to Controlled Experiments on the Web (Kohavi et al., 2007). Methodology Matters. Methodology Matters. Methods for Research in the Behavioral and Social Sciences
E N D
Evaluation Eyal Ophir CS 376 4/28/09
Readings • Methodology Matters (McGrath, 1994) • Practical Guide to Controlled Experiments on the Web (Kohavi et al., 2007)
Methodology Matters • Methods for Research in the Behavioral and Social Sciences • Different methods have strengths and weaknesses • Tradeoff between: • Generalizability • Precision • Realism • Credibility requires consistency, convergence across methods
Study Design • Find baserates, correlations, or differences • Randomization of selection, assignment to conditions • Statistical significance • Validity (internal, statistical, construct, external)
Measures • Self report • Trace measures • Observation (by a visible or hidden observer) • Archival records (public or private)
Manipulation • Selection • Direct intervention • Induction (indirect intervention: confederates, deception)
Case Study: Multitasking UI • Users play two simultaneous instantiations of a game • Does making the two instantiations visually different make it easier to switch back and forth?
Case Study • Tradeoffs: Generalizability, Precision, Realism • Design: baserates, correlations, differences • Random selection, assignment • Validity: internal, statistical, construct, external • Measures: self-report, trace measures, observation, archival records • Manipulation: selection, intervention, induction
General Question • Has social psychology resisted formal theory, and if so, why?
Web Experiments • OEC: Overall Evaluation Criterion
Web Experiments • Hypothesis testing and sample size • Confidence, power • Reducing the standard error • Sufficiently large sample size • OEC with inherently low variability • Reduce variability by excluding irrelevant cases
Web Experiments • Extensions for Online Experiments • Treatment ramp-up • Automation • Software Migration
Web Experiments • Limitations of web experiments • No explanation of mechanism • Focus on short term effects • Primacy/newness • Must implement treatments
Web Experiments • Implementation • Randomization • Pseudorandom with caching • Hash and partition • Assignment • Traffic splitting • Server-side • Client-side
Lessons learned (i.e.- tips for the researcher): • Analysis • Mine the Data • Time matters • Multi-factor experiments
Lessons Learned • Trust and Execution • Run A/A tests (test your system) • Ramp-up and abort • Correct sample size • Assign 50% to treatment • Beware day of week effects
Lessons Learned • Culture and Business • Agree on OEC upfront • Beware “harmless” features • Weigh performance vs. maintenance cost • Data-driven (vs. opinion-driven) culture
Extended Case Study • Assume the game UI from the first case study was an actual gaming site • The website is interested in promoting multiple simultaneous games between users, but users complain that it’s difficult to manage multiple games • Design a web-based study informed by the reading to test the new design
Case Study • OEC • Sample size, reducing error • Ramp-up, automation • Mechanism explanation, short vs. long-term effects, primacy/newness • Randomization/assignment • Mine the data, multi-factor experiments • A/A tests, sample size, day of week effects
Data-Oriented Culture • Pros? • Cons? • How can we best use user tests to inform design and innovation? • Trade-offs of experimentation vs. intuition • Why the OEC? What are good measures for non-commerce sites? • Do online tests maximize all McGrath’s parameters?