1 / 50

SBD: Usability Evaluation

SBD: Usability Evaluation. Chris North CS 3724: HCI. ANALYZE. analysis of stakeholders, field studies. claims about current practice. Problem scenarios. Scenario-Based Design. DESIGN. Activity scenarios. metaphors, information technology, HCI theory, guidelines. iterative

mandyl
Download Presentation

SBD: Usability Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SBD:Usability Evaluation Chris North CS 3724: HCI

  2. ANALYZE analysis of stakeholders, field studies claims about current practice Problem scenarios Scenario-Based Design DESIGN Activity scenarios metaphors, information technology, HCI theory, guidelines iterative analysis of usability claims and re-design Information scenarios Interaction scenarios PROTOTYPE & EVALUATE summative evaluation formative evaluation Usability specifications

  3. Evaluation • Formative vs. Summative • Analytic vs. Empirical

  4. Usability Engineering Reqs Analysis Design Evaluate Develop many iterations

  5. Usability Engineering Formative evaluation Summative evaluation

  6. Usability Evaluation • Analytic Methods: • Usability inspection, Expert review • Heuristic: Nielsen’s 10 • Cognitive walk-through • GOMS analysis • Empirical Methods: • Usability Testing • Field or lab • Observation, problem identification • Controlled Experiment • Formal controlled scientific experiment • Comparisons, statistical analysis

  7. User Interface Metrics • Ease of learning • Ease of use • User satisfaction

  8. User Interface Metrics • Ease of learning • learning time, … • Ease of use • performance time, error rates… • User satisfaction • surveys… Not “user friendly”

  9. Usability Testing

  10. Usability Testing • Formative: helps guide design • Early in design process • when architecture is finalized, then its too late! • Small # of users • Usability problems, incidents • Qualitative feedback from users • Quantitative usability specification

  11. Usability Specification Table metrics • e.g. frequent tasks should be fast

  12. Usability Test Setup • Set of benchmark tasks • Derived from scenarios (Reqs analysis phase) • Derived from claims analysis (Design phase) • Easy to hard, specific to open-ended • Coverage of different UI features • E.g. “Find the 5 most expensive houses for sale” • Different types: learnability vs. performance • Consent forms • Not needed unless recording user’s face/voice (new rule) • Experimenters: • Facilitator: instructs user • Observers: take notes, collect data, video tape screen • Executor: run the prototype for faked parts • Users • Solicit from target user community (Reqs analysis) • 3-5 users, quality not quantity

  13. Usability Test Procedure • Goal: mimic real life • Do not cheat by helping them complete tasks • Initial instructions • “We are evaluating the system, not you.” • Repeat: • Give user next benchmark task • Ask user to “think aloud” • Observe, note mistakes and problems • Avoid interfering, hint only if completely stuck • Interview • Verbal feedback • Questionnaire • ~1 hour / user

  14. Usability Lab • E.g. McBryde 102

  15. Data • Note taking • E.g. “&%$#@ user keeps clicking on the wrong button…” • Verbal protocol: think aloud • E.g. user thinks that button does something else… • Rough quantitative measures • HCI metrics: e.g. task completion time, … • Interview feedback and surveys • Video-tape screen & mouse • Eye tracking, biometrics?

  16. Analyze • Initial reaction: • “stupid user!”, “that’s developer X’s fault!”, “this sucks” • Mature reaction: • “how can we redesign UI to solve that usability problem?” • the data is always right • Identify usability problems • Learning issues: e.g. can’t figure out or didn’t notice feature • Performance issues: e.g. arduous, tiring to solve tasks • Subjective issues: e.g. annoying, ugly • Problem severity: critical vs. minor

  17. Cost-Importance Analysis • Importance 1-5: (task effect, frequency) • 5 = critical, major impact on user, frequent occurance • 3 = user can complete task, but with difficulty • 1 = minor problem, small speed bump, infrequent • Ratio = importance / cost • Sort by this, highest to lowest • 3 categories: Must fix, next version, ignored

  18. Refine UI • Solve problems in order of: importance/cost • Simple solutions vs. major redesigns • Iterate: • Test, refine, test, refine, test, refine, … • Until?

  19. Refine UI • Solve problems in order of: importance/cost • Simple solutions vs. major redesigns • Iterate: • Test, refine, test, refine, test, refine, … • Until? Meets usability specification

  20. Examples • Learnability problem: • Problem: user didn’t know he could zoom in to see more… • Potential solutions: • Better labeling: Better zoom button icon, tooltip • Clearer affordance: Add a zoom bar slider (like google maps) • … • NOT: more “help” documentation! You can do better. • Performance problem: • Problem: user took too long to repeatedly zoom in… • Potential solutions: • Faster affordance: Add a real-time zoom bar • Shortcuts: Icons for each zoom level: state, city, street • …

  21. Project (step 6): Usability Test • Usability Evaluation: • >=3 users: Not (tainted) HCI students • ~10 benchmark tasks • Simple data collection (Biometrics optional!) • Exploit this opportunity to improve your design • Report: • Procedure (users, tasks, specs, data collection) • Usability problems identified, specs not met • Design modifications

  22. Controlled Experiments

  23. Usability test vs. Controlled Expm. • Usability test: • Formative: helps guide design • Single UI, early in design process • Few users • Usability problems, incidents • Qualitative feedback from users • Controlled experiment: • Summative: measure final result • Compare multiple UIs • Many users, strict protocol • Independent & dependent variables • Quantitative results, statistical significance Engineering oriented Science oriented

  24. What is Science?

  25. What is Science? Phenomenon Engineering Measurement Science Modeling

  26. Scientific Method?

  27. Scientific Method • Form Hypothesis • Collect data • Analyze • Accept/reject hypothesis • How to “prove” a hypothesis in science?

  28. Scientific Method • Form Hypothesis • Collect data • Analyze • Accept/reject hypothesis • How to “prove” a hypothesis in science? • Easier to disprove things, by counterexample • Null hypothesis = opposite of hypothesis • Disprove null hypothesis • Hence, hypothesis is proved

  29. Example • Typical question: • Which visualization is better for which user tasks? Spotfire vs. TableLens

  30. Cause and Effect • Goal: determine “cause and effect” • Cause = visualization tool (Spotfire vs. TableLens) • Effect = user performance time on task T • Procedure: • Vary cause • Measure effect • Problem: random variation • Cause = vis tool OR random variation? random variation Realworld Collecteddata uncertain conclusions

  31. Stats to the Rescue • Goal: • Measured effect unlikely to result by random variation • Hypothesis: • Cause = visualization tool (e.g. Spotfire ≠ TableLens) • Null hypothesis: • Visualization tool has no effect (e.g. Spotfire = TableLens) • Hence: Cause = random variation • Stats: • If null hypothesis true, then measured effect occurs with probability < 5% (e.g. measured effect >> random variation) • Hence: • Null hypothesis unlikely to be true • Hence, hypothesis likely to be true

  32. Variables • Independent Variables (what you vary), and treatments (the variable values): • Visualization tool: • Spotfire, TableLens, Excel • Task type: • Find, count, pattern, compare • Data size (# of items): • 100, 1000, 1000000 • Dependent Variables (what you measure) • User performance time • Errors • Subjective satisfaction (survey) • HCI metrics

  33. Example: 2 x 3 design • n users per cell Ind Var 2: Task Type Ind Var 1: Vis. Tool Measured user performance times (dep var)

  34. Groups • “Between subjects” variable • 1 group of users for each variable treatment • Group 1: 20 users, Spotfire • Group 2: 20 users, TableLens • Total: 40 users, 20 per cell • “With-in subjects” (repeated) variable • All users perform all treatments • Counter-balancing order effect • Group 1: 20 users, Spotfire then TableLens • Group 2: 20 users, TableLens then Spotfire • Total: 40 users, 40 per cell

  35. Issues • Eliminate or measure extraneous factors • Randomized • Fairness • Identical procedures, … • Bias • User privacy, data security • IRB (internal review board)

  36. Procedure • For each user: • Sign legal forms • Pre-Survey: demographics • Instructions • Do not reveal true purpose of experiment • Training runs • Actual runs • Give task • measure performance • Post-Survey: subjective measures • * n users

  37. Data • Measured dependent variables • Spreadsheet:

  38. Step 1: Visualize it • Dig out interesting facts • Qualitative conclusions • Guide stats • Guide future experiments

  39. Step 2: Stats Ind Var 2: Task Type Ind Var 1: Vis. Tool Average user performance times (dep var)

  40. TableLens better than Spotfire? • Problem with Averages? Avg Perf time (secs) Spotfire TableLens

  41. TableLens better than Spotfire? • Problem with Averages? lossy • Compares only 2 numbers • What about the 40 data values? (Show me the data!) Avg Perf time (secs) Spotfire TableLens

  42. The real picture • Need stats that compare all data • What if all users were 1 sec faster on TableLens? • What if only 1 user was 20 sec faster on TableLens? Avg Perf time (secs) Spotfire TableLens

  43. Statistics • t-test • Compares 1 dep var on 2 treatments of 1 ind var • ANOVA: Analysis of Variance • Compares 1 dep var on n treatments of m ind vars • Result: • p = probability that difference between treatments is random (null hypothesis) • “statistical significance” level • typical cut-off: p < 0.05 • Hypothesis confidence = 1 - p

  44. In Excel

  45. p < 0.05 • Woohoo! • Found a “statistically significant” difference • Averages determine which is ‘better’ • Conclusion: • Cause = visualization tool (e.g. Spotfire ≠ TableLens) • Vis Tool has an effect on user performance for task T … • “95% confident that TableLens better than Spotfire …” • NOT “TableLens beats Spotfire 95% of time” • 5% chance of being wrong! • Be careful about generalizing

  46. p > 0.05 • Hence, no difference? • Vis Tool has no effect on user performance for task T…? • Spotfire = TableLens ?

  47. p > 0.05 • Hence, no difference? • Vis Tool has no effect on user performance for task T…? • Spotfire = TableLens ? • NOT! • Did not detect a difference, but could still be different • Potential real effect did not overcome random variation • Provides evidence for Spotfire = TableLens, but not proof • Boring, basically found nothing • How? • Not enough users • Need better tasks, data, …

  48. Data Mountain • Robertson, “Data Mountain” (Microsoft)

  49. Data Mountain: Experiment • Data Mountain vs. IE favorites • 32 subjects • Organize 100 pages, then retrieve based on cues • Indep. Vars: • UI: Data mountain (old, new), IE • Cue: Title, Summary, Thumbnail, all 3 • Dependent variables: • User performance time • Error rates: wrong pages, failed to find in 2 min • Subjective ratings

  50. Data Mountain: Results • Spatial Memory! • Limited scalability?

More Related