Erfaringer med Remote Usability Testing?

Erfaringer med Remote Usability Testing? Jan Stage Professor, PhD Forskningsleder i Informationssystemer (IS)/Human-Computer Interaction (HCI) Aalborg Universitet, Institut for Datalogi, HCI-Lab jans@cs.aau.dk

Oversigt • Undersøgelse 1 • Undersøgelse 2

Oversigt • Undersøgelse 1: synkron eller asynkron • Metode • Resultater • Konklusion • Undersøgelse 2

Empirical Study 1 • Four methods: LAB – RS – AE – AU • Test subjects: 6 in each condition (18 users and 6 with usability expertise), all students at Aalborg University • System: Email client (Mozilla Thunderbird 1.5) • 9 defined tasks (typical email functions) • Setting, procedure and data collection in accordance with method • Data analysis: 24 outputs were analysed by three persons in random and different order • Generated their individual lists of usability problems with their own categorizations (also for the AE and AU conditions) • These were merged into an overall problem list through negotiation

Results: Task Completion • No significant difference in task completion • Significant difference in task completion time • The users in the two asynchronous conditions spent considerably more time • We do not know the reason

Results: Usability Problems Identified • A total of 46 usability problems • No significant difference between LAB and RS • AE/AU identified significantly fewer problems, also critical problems • No significant difference between AE and AU in terms of problems identified

Conclusion • RS is the most widely described and used remote method. The performance is virtually equivalent to LAB (or slightly better) • AE and AU perform surprisingly well • Experts do not perform significantly better than users • Video analysis (LAB and RS) required considerably more evaluator effort than the user-based reporting (AU and AE) • Users can actually contribute to usability evaluation – not with the same quality, but reasonably well, and there are plenty of them

Oversigt • Undersøgelse 1 • Undersøgelse 2: hvilken asynkron metode • Metode • Resultater • Konklusion

Empirical Study 2 • Purpose: examine and compare remote asynchronous methods • Focus on usability problems identified • Comparable with the previous study • Selection of asynchronous methods based on literature survey

The 3 Remote Asynchronous Methods • User-reported critical incident (UCI) • Well-defined method (Castillo et al. CHI 1998) • Forum-based online reporting and discussion (Forum) • Assumption: through collaboration participants may give input which increases data quality and richness (Thompson, 1999) • A source for collecting qualitative data in a study of auto logging (Millen, 1999): the participants turned out to report detailed usability feedback • Diary-based longitudinal user reporting (Diary) • Used on a longitudinal basis for participants in a study of auto logging to provide qualitative information (Steves et al. CSCW 2001) • First day: same tasks as the other conditions (first part of diary delivered) • Four more days: new tasks (same type) sent daily (complete diary delivered) • Conventional user-based laboratory test (Lab) • Included as benchmark

Empirical Study (1) • Participants: • 40 test subjects, 10 for each condition • Students, age 20 to 30 • Distributed evenly: gender and tech/non-tech education • Setting: • LAB: in our usability lab • Remote asynchronous: in the participants’ homes • Participants in the remote asynchronous conditions received the software and installed it on their computer • Training material for the remote asynchronous conditions • Identification and categorisation of usability problems • A minimalist approach that was strictly remote and asynchronous (via email)

Empirical Study (2) • Tasks: • Nine fixed tasks • The same across the four conditions to ensure that all participants used the same parts of the system • Typical email tasks (same as previous study) • Data collection in accordance with the method • LAB: video recordings • UCI: web-based system for generating problem descriptions while solving tasks • Forum: after solving tasks, one week for posting and discussing problems • Diary: a diary with no imposed structure; first part after the first day

Data Analysis • All data collected before the data analysis started • 3 evaluators did the whole data analysis • The 40 data sets were analysed by the 3 evaluators • In random order: by a draw • In different order between them • The user input from the three remote conditions was transformed into usability problem descriptions • Each evaluator generated his/her own individual lists of usability problems with their own severity ratings • A problem list for each condition • A complete problem list (joined) • These were merged into an overall problem list through negotiation

Results: Task Completion Time • Considerable variation in task completion times • Participants in the remote conditions worked in their home at a time they selected • For each task there was a hint that allowed them to check if they had solved the task correctly • As we have no data on the task solving process in the remote conditions, we cannot explain this variation

Results: Usability Problems Identified • LAB: significantly better than the 3 remote conditions • UCI-Forum: no significant difference • UCI-Diary: significant overall: Diary – also significant on cosmetic • Forum-Diary: significant overall: Diary – not significant on any level

Results: Evaluator Effort • The sum for all evaluators involved in each activity • Time for finding test subjects is not included (8h, common for all) • Task specifications from an earlier study. Preparation in the remote conditions: work out written instructions • Considerable differences between the remote conditions for analysis and merging of problem lists

Conclusion • The three remote methods performed significantly below the classical lab test in terms of the number of usability problems identified • The Diary was the best remote method – it identified half of the problems found in the Lab condition • UCI and Forum performed similarly for critical problems but worse for serious problems • UCI and Forum took 13% of the lab test. Diary took 30% • The productivity of the remote methods was considerably higher

Interaktionsdesign og usability-evaluering Master i IT Videreuddannelse under IT-Vest Fagpakke i Interaktionsdesign og usability-evaluering starter 1/2-12 Optager bachelorer, men også indgang for datamatikere Information: http://www.master-it-vest.dk/

Erfaringer med Remote Usability Testing?

Erfaringer med Remote Usability Testing?

Presentation Transcript

SE 4367 Functional Testing

Forensic DNA Testing

Overview

Anesthesia in Remote Locations: Radiology and Beyond

Usability Testing

Compatibility Testing

The Story of Story: Using Narrative Elements in the Service of Usability.

Testing IDS

Chapter 11, Testing

An Introduction to Usability Testing

An Introduction to Usability Testing

Chapter 20: Toxicity Testing

Usability of Grouping of Retrieval Results

Usability…and Beyond!

Software Testing Day 2: Functional Testing

How To Spend Less and Get More From Usability Testing

Unit 3. Think-aloud Usability Testing

Coercion-Resistant Remote Voting

Chapter 9

Usability with Project Lecture 3 – 16/9/09

Chapter 8 RMON - Remote Monitoring