200 likes | 224 Views
Methods and Techniques of investigating user behavior. aims. theory. Introduction - why M & T? Gerrit C. van der Veer gerrit@cs.vu.nl. methods. planning. presentation. Methods and techniques for empirical research. Goals for this course understand why understand basic theory
E N D
Methods and Techniquesof investigating user behavior aims theory Introduction - why M & T? Gerrit C. van der Veer gerrit@cs.vu.nl methods planning presentation
Methods and techniques for empirical research Goals for this course • understand why • understand basic theory • know basic methods and techniques • know how to plan your research • know when to ask for expert consult
Goals of empirical researchan example Cultural utterances of Martians - artifacts we found: How to develop a science on this - goals in sequence: • description (variables, quantification, measuring relations) • prediction (based on knowledge of relations) • explanation (causal models) • manipulation (apply control based on known causality)
Characteristics of scientific knowledge unambiguous • operational definitions for observable phenomena • measurement techniques • scientific language: concepts and relations (esp. unobservable phenomena) repeatable studies • describe procedures, population and samples of observations • reliability (of measurement, observers, raters, tests) controlled for disturbing phenomena • design of study / experiment (sequence, balancing , control groups) • sample • models for measurement of “other” variables and statistical control
Research methods observation in nature • case studies (context of use, community of practice, +? -?) field study and survey • systematic observation / interview / focus group • focused on some phenomena • influence of participant observer correlation study • tests / questionnaires / behavior measurements • focus on relations between variables • measures no causality (e.g. Malaria)
Research methods observation in nature field study and survey correlation study experiment • manipulation of candidate causes • measuring effects • controlling possible other causes
Data collection choice of technique based on • sensitivity for the phenomena • reliability and objectivity • validity • internal - intended concept • external - representative for population of phenomena, context & situation • practicality (effort, time, availability)
Data collection types of techniques • observation of behavior • registration of ….. behavior, physiological data • think aloud during processes / activities • pro? …. con? • video with retrospective protocols • interview • free ….. structured • objective test • questionnaires • written interview ….. subjective rating scales • unobtrusive measurements (e.g. logs)
Scoring translation of data in units that allow modeling and analysis: numbers or defined categories needs interpretation prescriptions that are part of the operational definition: • relative (frequency per …) / absolute (reaction time) • duration time (sometimes relative to ..) • intensity / strength • category of behavior / option chosen (e.g. marital status) complex phenomena: • patterns, spectrum, “half-life”
Scales of measurement Have been discussed in the Bachelor course “Toegepaste Statistiek” • ratio scale: 1-dimensional, absolute (comparison with standard unit), zero=0, cardinal scale e.g. time on 100 m. • interval scale: no absolute zero e.g. intelligence coefficient • ordinal scale: comparison between observed data (possible “tie”) so no standard unit e.g. results sports competition • nominal “scale”: verbal labels or number labels1=single; 2=married; 3=divorced; 4= widowed; 5=living together
Validity of measures To what extent does one observe and measure what is aimed at. • predictive validity - predictive power for other behavior (school exam score for job selection) • content validity - representative for the intended domain (items in an intelligence test) • concurrent validity - consistency with other types of measures for the same concept (self report v.s. teacher rating) • concept / construct validity - (multiple choice math questions to measure mathematical ability)
Experiment: definition Objective observation of effects that are produced in a controlled situation, where one or more factors are manipulated and others are kept constant (Zimney 1961) terminology: • subject • experimenter • independent variables (antecedent conditions, treatments) • dependent variables (effects) • disturbing / secondary / potential variables e.g. effect of pre-knowledge on learning speed (with motivation) p m l / p l & m l / m p & m l intermediating confounding artifact of selection
Categories of secondary / confounding variables 1. person variables • capabilities • motivation • age • educational background 2. sequence variables • fatigue / boredom / learning • development of subject during (longitudinal) study in relation to experiment 3. situation variables • environment: sound/temperature/day time • experimenter effect on subject / experimenter observation bias • task effect: difficulty / modality of stimulus or instruction
Experimental design - how to cope with secondary variables Main decision is based on type of the expected / known main confounding variables • person variables repeated measures design: each person is measured in all conditions • needs balancing for possible sequence effects • sequence variables multiple groups design: each person is in a single group and participates in one condition only • needs matched groups (keeps person variables in control) or • randomized groups (more easy, less controlled)
Factorial design:In practice we often need a combination of the previous designs factors between subjects to control for unwanted sequence effects factors within subjects (repeated measurements) to control for person variables and: … we still need to control for situation variables to: • keep these constant (if possible in field experiments) • measure them and apply statistical control
Example theorybased on previous observation of phenomena, variables, and relations: women have difficulty to navigate with 3D interface this phenomenon disappears if screen is sufficiently large
Example hypothesis:women have more difficulty to navigate with 3D interface than men, unless screen is large Independent variables: • gender (F/M) • interface type (2D / 3D) • screen size (Small/Large) Dependent variable: navigation performance on set of standard tasks • operationally defined: time to click on target button (task effect?) Confounding variables: • sequence of interface types (makes aware of navigation issues) • learning (can be handled by balancing)
Factorial design Between subjects • gender (obvious) F/M • interface type (awareness could destroy effect) 2D/3D makes 2*2=4 groups Within subjects • screen size S/M • balanced for learning (at random half of subjects in each group S-M, other half M-S) • for each size 10 navigation trials (to increase validity of navigation problems) • randomly allocated to size from a set of 20 (because ….?) makes 10+10=20 trials with effect measurement per person
Effects to be tested - ANOVA: each test is statistically independent from the others gender differences total - not a hypothesis interface type (2D vs 3D) - not a hypothesis screen size - not a hypothesis sequence effects of trials and interaction with other - not a hypothesis gender differences in relation to screen size (interaction) - not a hypothesis interface type in relation to screen size (interaction) - not a hypothesis gender differences in relation to type (2D vs 3D) (interaction) gender differences in relation to screen size and interface type (interaction)
Stability and reliability of experiment Reliability = reproducibility of the phenomenon in the hypothetical case it could be repeated at the same point of time in the same circumstances Instability is the reverse, caused by: 1. Characteristics of the measurement technique 2. Observer bias 3. Changes in the observer (fatigue - sequence issue) 4. Changes in the situation 5. Changes in the object/person studied (aging, attitude change - sequence issue) 4 and 5 are not always a case of unreliability, these changes may be covered by theory (should be topic of empirical study themselves)