200 likes | 230 Views
Learn about basic theory, methods, and planning in empirical research, along with data collection techniques and measurement scales. Understand the importance of scientific knowledge and validity measures in experimentation.
E N D
Methods and Techniquesof investigating user behavior aims theory Introduction - why M & T? Gerrit C. van der Veer gerrit@cs.vu.nl methods planning presentation
Methods and techniques for empirical research Goals for this course • understand why • understand basic theory • know basic methods and techniques • know how to plan your research • know when to ask for expert consult
Goals of empirical researchan example Cultural utterances of Martians - artifacts we found: How to develop a science on this - goals in sequence: • description (variables, quantification, measuring relations) • prediction (based on knowledge of relations) • explanation (causal models) • manipulation (apply control based on known causality)
Characteristics of scientific knowledge unambiguous • operational definitions for observable phenomena • measurement techniques • scientific language: concepts and relations (esp. unobservable phenomena) repeatable studies • describe procedures, population and samples of observations • reliability (of measurement, observers, raters, tests) controlled for disturbing phenomena • design of study / experiment (sequence, balancing , control groups) • sample • models for measurement of “other” variables and statistical control
Research methods observation in nature • case studies (context of use, community of practice, +? -?) field study and survey • systematic observation / interview / focus group • focused on some phenomena • influence of participant observer correlation study • tests / questionnaires / behavior measurements • focus on relations between variables • measures no causality (e.g. Malaria)
Research methods observation in nature field study and survey correlation study experiment • manipulation of candidate causes • measuring effects • controlling possible other causes
Data collection choice of technique based on • sensitivity for the phenomena • reliability and objectivity • validity • internal - intended concept • external - representative for population of phenomena, context & situation • practicality (effort, time, availability)
Data collection types of techniques • observation of behavior • registration of ….. behavior, physiological data • think aloud during processes / activities • pro? …. con? • video with retrospective protocols • interview • free ….. structured • objective test • questionnaires • written interview ….. subjective rating scales • unobtrusive measurements (e.g. logs)
Scoring translation of data in units that allow modeling and analysis: numbers or defined categories needs interpretation prescriptions that are part of the operational definition: • relative (frequency per …) / absolute (reaction time) • duration time (sometimes relative to ..) • intensity / strength • category of behavior / option chosen (e.g. marital status) complex phenomena: • patterns, spectrum, “half-life”
Scales of measurement Have been discussed in the Bachelor course “Toegepaste Statistiek” • ratio scale: 1-dimensional, absolute (comparison with standard unit), zero=0, cardinal scale e.g. time on 100 m. • interval scale: no absolute zero e.g. intelligence coefficient • ordinal scale: comparison between observed data (possible “tie”) so no standard unit e.g. results sports competition • nominal “scale”: verbal labels or number labels1=single; 2=married; 3=divorced; 4= widowed; 5=living together
Validity of measures To what extent does one observe and measure what is aimed at. • predictive validity - predictive power for other behavior (school exam score for job selection) • content validity - representative for the intended domain (items in an intelligence test) • concurrent validity - consistency with other types of measures for the same concept (self report v.s. teacher rating) • concept / construct validity - (multiple choice math questions to measure mathematical ability)
Experiment: definition Objective observation of effects that are produced in a controlled situation, where one or more factors are manipulated and others are kept constant (Zimney 1961) terminology: • subject • experimenter • independent variables (antecedent conditions, treatments) • dependent variables (effects) • disturbing / secondary / potential variables e.g. effect of pre-knowledge on learning speed (with motivation) p m l / p l & m l / m p & m l intermediating confounding artifact of selection
Categories of secondary / confounding variables 1. person variables • capabilities • motivation • age • educational background 2. sequence variables • fatigue / boredom / learning • development of subject during (longitudinal) study in relation to experiment 3. situation variables • environment: sound/temperature/day time • experimenter effect on subject / experimenter observation bias • task effect: difficulty / modality of stimulus or instruction
Experimental design - how to cope with secondary variables Main decision is based on type of the expected / known main confounding variables • person variables repeated measures design: each person is measured in all conditions • needs balancing for possible sequence effects • sequence variables multiple groups design: each person is in a single group and participates in one condition only • needs matched groups (keeps person variables in control) or • randomized groups (more easy, less controlled)
Factorial design:In practice we often need a combination of the previous designs factors between subjects to control for unwanted sequence effects factors within subjects (repeated measurements) to control for person variables and: … we still need to control for situation variables to: • keep these constant (if possible in field experiments) • measure them and apply statistical control
Example theorybased on previous observation of phenomena, variables, and relations: women have difficulty to navigate with 3D interface this phenomenon disappears if screen is sufficiently large
Example hypothesis:women have more difficulty to navigate with 3D interface than men, unless screen is large Independent variables: • gender (F/M) • interface type (2D / 3D) • screen size (Small/Large) Dependent variable: navigation performance on set of standard tasks • operationally defined: time to click on target button (task effect?) Confounding variables: • sequence of interface types (makes aware of navigation issues) • learning (can be handled by balancing)
Factorial design Between subjects • gender (obvious) F/M • interface type (awareness could destroy effect) 2D/3D makes 2*2=4 groups Within subjects • screen size S/M • balanced for learning (at random half of subjects in each group S-M, other half M-S) • for each size 10 navigation trials (to increase validity of navigation problems) • randomly allocated to size from a set of 20 (because ….?) makes 10+10=20 trials with effect measurement per person
Effects to be tested - ANOVA: each test is statistically independent from the others gender differences total - not a hypothesis interface type (2D vs 3D) - not a hypothesis screen size - not a hypothesis sequence effects of trials and interaction with other - not a hypothesis gender differences in relation to screen size (interaction) - not a hypothesis interface type in relation to screen size (interaction) - not a hypothesis gender differences in relation to type (2D vs 3D) (interaction) gender differences in relation to screen size and interface type (interaction)
Stability and reliability of experiment Reliability = reproducibility of the phenomenon in the hypothetical case it could be repeated at the same point of time in the same circumstances Instability is the reverse, caused by: 1. Characteristics of the measurement technique 2. Observer bias 3. Changes in the observer (fatigue - sequence issue) 4. Changes in the situation 5. Changes in the object/person studied (aging, attitude change - sequence issue) 4 and 5 are not always a case of unreliability, these changes may be covered by theory (should be topic of empirical study themselves)