Kinds of Research Design

Kinds of Research Design • Non Experimental (Correlational) Designs • No manipulation of Independent Variable • High on realism, low on control • For example: Surveys, Observational Studies • For example: • Do women use the internet for managing social relationships more than men? • Do people like to shop online for books more than for clothes?

Quasi-Experimental (Correlational) Designs • Manipulation of Independent Variable, often in field settings • High on realism, low on control • For example: Field Studies • For example: • Do people’s shopping habits differ offline and online? • Can children learn to use the web without any formal instruction?

Pure Experimental Designs • Manipulation of Independent Variable, often in lab settings • Often the only way to establish causal relationships • Low on realism, high on control • For example: Lab Studies • For example: • Do people’s shopping habits differ offline and online? • Can children learn to use the web without any formal instruction?

Research Problems are inherently suited to one method or the other • Often you will not have the luxury to choose method of study • Choice of method depends on many factors • Does data pertaining to problem already exist? • Is it ethical to do lab / field experiment (example of research on link between lead and children’s cognitive ability) • Do you need to need to make causal link, or is finding correlational link acceptable • Resources (depends on many factors) • Your training (experiments difficult to setup, easier to analyze, surveys easy to setup, can be difficult to analyze)

Survey as a Non-Experimental Research Design Focus on web-based surveys Example Survey: Daily Internet Tracking Survey

Content Based Strategy • Identify everything related to the concept that you are testing. Content can be based on expert opinion, user observation, a theory etc. >>For example: You can reason that a good interface should easy to remember and pleasing to the eye.easy to remember etc. Advantage: economical method Disadvantage: the content that you derive the questions from might not be correct

Statistics Based Strategy Let the data speak for itself. Identify items related to the concept. Administer this scale to the relevant sample. Use statistical procedures ( e.g., item analysis or factor analysis) to identify items related to the concept you are interested in. >>For example: For a usability scale identify a large number of usability items, and administer them to the sample of users.

External criterion based strategy Items are selected on the basis of their ability to differentiate between two groups of people. Method: Develop scale. Validate it against a criterion population. >> For example: For a scale of usability, independently identify a few good and bad web-sites. Select the items which can distinguish the good and bad sites.

Combination: Generate items for a scale of usability from previous scales and articles in the field (content based) Select and retain scale items by item analysis (statistical methods) Evaluate scale by testing its ability to discriminate web sites by comparing with criterion, i.e., experts evaluation of good and bad sites (criterion based method) Which method works best?

Kinds of scales • Unipolar: a single phrase or term referring to some behavior is used. • e.g., degree of dominance: low dominance, high dominance. • Bipolar: Unipolar scale can be changed to a bipolar one by use of two adjectives. • e.g., Submissive and dominant: the middle category reflects equal amounts of dominance and submissiveness.

Numerical scales: • Ratings are made on a series of ordered categories, with different values being assigned to different categories. Numbers or adjectives can be used for the values. This web-site’s navigation structure is: Excellent Good Fair Poor Bad 1 2 3 4 5

Semantic differential scales • concepts are rated on several seven-point bipolar adjective scales. For example: Rate your mother Bad __ __ __ __ __ __ __ Good Weak __ __ __ __ __ __ __ Strong

Graphic Rating Scales The two end points and the in between points are described by graphic descriptions which denote magnitude of the variable being measured. • For example: This software works properly: All the Almost all Most of Sometimes Never time the time the times

Standards Scale Items are compared with other similar items (standards) on some dimension. Standards can also be brief behavioral descriptions instead of actual items. • For example: In terms of its menu options, BBEdit is most similar to: Notepad WordPad WordPerfect MS Word

Behaviorally Anchored Scales (BARS) (BARS) attempt to make the terminology of rating scales more descriptive of actual behavior and therefore more objective. For example: When I am using Microsoft Word and the Office Assistant pops up, I am: • glad to be for helped • annoyed • exit word and start using emacs

Forced Choice Scales Rater is provided with two descriptive statements/options that are matched in social desirability. For example: What is your previous experience with video-communication services? I have had no previous experience I use it very often

Rating Errors Constant Error: or range restrictions occur when ratings tend to be clustered in one part of the scale leniency error: in the higher part of the scale severity error: in the lower part of the scale central tendency error: in the middle range Such tendencies do not always constitute errors. There can be cultural differences. To check for it:Compare each raters rating with mean (without the rater) for each item. • To take care of it (statistically): standardize scores

Halo Effect The tendency to respond to a general impression of ratee and or to overgeneralize favorable / unfavorable ratings based on impression of a few dimensions. For example: You are asked to rate a software you don’t know much about. Once you used it for a few minutes and it crashed. Now all your ratings will be based on that impression. Solution: Add a “don’t know”, “undecided” option

Contrast Error The tendency to change rating because of the effect of some anchor point: (a) assigning a higher rating than justified if item before received very low rating or vice versa. This is more problematic since it is a systematic error. Solution: Can deal with it by randomizing order. (b) tendency to use self as an anchor in assigning rating. If this is a constant effect, the it might not matter.

Proximity Error The actual location of item on page might effect rating. For example: raters often assign similar ratings to a person on items that are closer together on a printed page. Solution: randomize order Most recent performance: Ratee is judged not on impression but their most recent impression.

Ambiguity Error If questions are ambiguous! This will effect all items. Solution: Pilot and ask respondents how they interpreted questions.

Anchors in questionnaires Anchors are the verbal comments above the numbers ('strongly agree', etc.). Factual questions: having anchors above all the response options will give more accurate results. Opinion or attitude work: it is good to indicate the central (neutral) point but anchors might not be as crucial.

Should “no” or “don’t know” option be included: Factual questions: not so important, unless issues of privacy are involved. Opinion questionnaire: if many respondents complain about items 'not being applicable' to the situation, you should consider carefully whether these items should be changed or re-worded. Anchors in questionnaires

Advantages of using questionnaires in usability research • Feedbackfrom the point of view of the user. • Flexible comparisons: Measures gained from a questionnaire are largely independent of the specific system, users, or tasks. Therefore you could compare: • same system with other functionally equivalent systems, • same system at other times, • ease of use of System A with ease of use of other systems, • site redesign, • competitors sites. • Quick and therefore cost effective to administer and to score and that you can gather a lot of data. • Questionnaires can also be used to measure subjective responses in an experimental context.

Disadvantages of questionnaires in usability research • is a subjective measure, answers questions about perception of event/object rather than event itself. • Gives you broad data, rather than specific advice. Cannot tell you what is going right or wrong, can get you near the issues. • Does not work for getting feedback on new ideas, only works on concrete objects/events.

Disadvantages of questionnaires in usability research (continued) • Often does not correlate with behavior since it is an indirect method. • Questions are fixed: little possibility to include new questions on request from the respondent, hard to give clarification to the user if he/she needs any • Sampling issues are especially crucial

Practical aspects about surveys in usability • Survey Length: Keep it short. Keep to a single screen. • Use Adaptive Surveys to keep it short: >>Alternate questions from one to other user. >>Pose different subsequent questions to different users based on previous response. • Pilot Test: Ask two to three people from your sample. • Questions:short, easy to understand, not ambiguous, or inappropriate, or intrusive

Response Rate is very important • Getting a high response rate is crucial for the validity of the survey. If only a few, unrepresentative users answer, then you might as well toss a coin. • Incentives will improve the response rate, but you can't buy responses. If you offer too much, you will attract respondents who are not typical users but visit your site because of the rumored big prize.

Do attitudes correlate with behavior? Degree of Control: A person might have positive or negative attitude towards Object A, but not have any control over the action. Ask about control of behavior

When do attitudes predict behavior Directly formed attitudes predict behavior more than indirectly formed attitudes: For example: If a person who has direct experience of customer service at Etrade tells you the service “sucks”, it is more likely to correlate with behavior than an indirectly formed attitude. Attitudes and norms in the immediate social context: Attitudes are also affected by the social norms around the person. For example: I might have a neutral attitude towards Microsoft, but if I hang out with a bunch of people who hate Microsoft, I am likely to be affected.

When do attitudes predict behavior Attitudes and Values:Attitudes affect behavior more if they are in accordance with person’s value system. For Example:I might think Amazon.com is a pretty good site. But maybe I have strong values about the promotion of small independent bookstores and try to promote their use. In that case, my opinion of Amazon.com's usability will not affect my behavior towards it.

Kinds of Research Design