160 likes | 250 Views
An Examination of Different Delivery Modes for Interactive IR Studies. Diane Kelly School of Information and Library Science University of North Carolina Schloss Dagstuhl, IIR Seminar, March 03, 2009. Different Types of IIR Studies. Standard Evaluation Usability Study Experiment
E N D
An Examination of Different Delivery Modes for Interactive IR Studies Diane Kelly School of Information and Library Science University of North Carolina Schloss Dagstuhl, IIR Seminar, March 03, 2009
Different Types of IIR Studies • Standard Evaluation • Usability Study • Experiment • Requires manipulation of independent variable • Random assignment to condition • Lab and Field Experiments • Log-based Analysis • Information-Seeking (Online and Otherwise)
Other types of IIR Studies • Infrastructure Development (NOT an IIR or “User” study) • “Users” made relevance assessments • “Users” label objects for training data
Different Types of Online Studies • Web Experiment • Remote Usability Studies • Synchronous • Asynchronous • Surveys (Questionnaire Mode) • Correlation Designs • Often used to test psychometric properties of an instrument • Interviews and Focus Groups • Mechanical Turk and ESP
Major Issues to Consider • Validity • Internal • External • Reliability • Sampling • Control • Sources of Variance
“Some say that psychological science is based on research with rats, the mentally disturbed, and college students. We study rats because they can be controlled, the disturbed because they need help, and college students because they are available.” - Birnbaum, M. H. (1999). Testing critical properties of decision making on the Internet. Psychological Science, 10, 399-407, pg. 399.
Some Good Things • Broader range of more diverse participants • Age • Education • Race • Culture • Geography • Sex • … • Targeted Recruitment • Large samples (increased statistical power) • Science becomes more accessible to more people
More Good Things • Experimental situation is less artificial (although not completely) • Familiarity and comfort with physical situation • No travel time • No coordination • No navigation
And Even More Good Things • Volunteer Bias (?) • Freedom to Quit • In general (condition-independent drop-out ) • As an indicator (condition-dependent drop-out) • Computation of refusal rates • Demand Effects • Experimenter Effects (includes biases introduced during execution of the experiment, data transformation, analysis and interpretation)
And a Few More Good Things • Lower costs • Openness • Replication
Some Bad Things • Control Issues (Cheating and Fraud) • Multiple submissions • Faking data • Collaborating with others • Imitation of treatments • Control Issues (Experimental Control) • Do subjects understand what they are suppose to be doing? • Multi-tasking • Interruptions • Consulting other sources • EWI
More Bad Things • So, time is not such a great measure anymore (but maybe it isn’t really a good measure anyway) • Self-selection Bias (Topical interests) • No control over recruitment • Attrition (too high is not good) • Technical variance • Communication challenges with subjects • Difficult to explain deception
And a few More … • Experiment “Marketplace” • Encourages researcher laziness and carelessness? • Requires more knowledge of experimental design and measurement • Measurement checks • Decisions to eliminate data • Bad designs waste people’s time
More Thoughts and Questions • Some of the “bad” things add random error to the model (which exists even in lab experiments) • But you gain more participants, so if this error stays constant proportionately to your sample size, then is this an issue? • Random assignment to condition CRITICAL • Ultimately, do you get what you pay for?
And Finally … • Many studies have found that results obtained from lab and web experiments are similar • A Web mode does not excuse poor experimental design or instrumentation • What are the implications with respect to reporting practices?
For Your Reference • Birnbaum, M. H. (2000). Psychological Experiments on the Internet. London, UK: Academic Press. • Olonso, O., Rose, D., & Stewart, B. (2008). Crowdsourcing for relevance feedback. SIGIR Forum, 42(4), 9-15. • http://psych.hanover.edu/research/exponnet.html