1 / 24

Evaluation

Evaluation. Eyal Ophir CS 376 4/28/09. Readings. Methodology Matters (McGrath, 1994) Practical Guide to Controlled Experiments on the Web (Kohavi et al., 2007). Methodology Matters. Methodology Matters. Methods for Research in the Behavioral and Social Sciences

spyridon
Download Presentation

Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation Eyal Ophir CS 376 4/28/09

  2. Readings • Methodology Matters (McGrath, 1994) • Practical Guide to Controlled Experiments on the Web (Kohavi et al., 2007)

  3. Methodology Matters

  4. Methodology Matters • Methods for Research in the Behavioral and Social Sciences • Different methods have strengths and weaknesses • Tradeoff between: • Generalizability • Precision • Realism • Credibility requires consistency, convergence across methods

  5. Study Design • Find baserates, correlations, or differences • Randomization of selection, assignment to conditions • Statistical significance • Validity (internal, statistical, construct, external)

  6. Measures • Self report • Trace measures • Observation (by a visible or hidden observer) • Archival records (public or private)

  7. Manipulation • Selection • Direct intervention • Induction (indirect intervention: confederates, deception)

  8. Case Study: Multitasking UI • Users play two simultaneous instantiations of a game • Does making the two instantiations visually different make it easier to switch back and forth?

  9. Case Study

  10. Case Study

  11. Case Study • Tradeoffs: Generalizability, Precision, Realism • Design: baserates, correlations, differences • Random selection, assignment • Validity: internal, statistical, construct, external • Measures: self-report, trace measures, observation, archival records • Manipulation: selection, intervention, induction

  12. General Question • Has social psychology resisted formal theory, and if so, why?

  13. Practical Guide to Controlled Experiments on the Web

  14. Web Experiments • OEC: Overall Evaluation Criterion

  15. Web Experiments • Hypothesis testing and sample size • Confidence, power • Reducing the standard error • Sufficiently large sample size • OEC with inherently low variability • Reduce variability by excluding irrelevant cases

  16. Web Experiments • Extensions for Online Experiments • Treatment ramp-up • Automation • Software Migration

  17. Web Experiments • Limitations of web experiments • No explanation of mechanism • Focus on short term effects • Primacy/newness • Must implement treatments

  18. Web Experiments • Implementation • Randomization • Pseudorandom with caching • Hash and partition • Assignment • Traffic splitting • Server-side • Client-side

  19. Lessons learned (i.e.- tips for the researcher): • Analysis • Mine the Data • Time matters • Multi-factor experiments

  20. Lessons Learned • Trust and Execution • Run A/A tests (test your system) • Ramp-up and abort • Correct sample size • Assign 50% to treatment • Beware day of week effects

  21. Lessons Learned • Culture and Business • Agree on OEC upfront • Beware “harmless” features • Weigh performance vs. maintenance cost • Data-driven (vs. opinion-driven) culture

  22. Extended Case Study • Assume the game UI from the first case study was an actual gaming site • The website is interested in promoting multiple simultaneous games between users, but users complain that it’s difficult to manage multiple games • Design a web-based study informed by the reading to test the new design

  23. Case Study • OEC • Sample size, reducing error • Ramp-up, automation • Mechanism explanation, short vs. long-term effects, primacy/newness • Randomization/assignment • Mine the data, multi-factor experiments • A/A tests, sample size, day of week effects

  24. Data-Oriented Culture • Pros? • Cons? • How can we best use user tests to inform design and innovation? • Trade-offs of experimentation vs. intuition • Why the OEC? What are good measures for non-commerce sites? • Do online tests maximize all McGrath’s parameters?

More Related