300 likes | 321 Views
Lecturer – Prof Jim Warren. Lecture 7 - Usability Testing in Practice Based on Heim, Chapter 8. Usability testing in practice. We seldom have the time and resources for a classic formal usability test And, most of the time, we shouldn’t wait for such resources anyway before seeking feedback
E N D
Lecturer – Prof Jim Warren Lecture 7 - Usability Testing in PracticeBased on Heim, Chapter 8
Usability testing in practice • We seldom have the time and resources for a classic formal usability test • And, most of the time, we shouldn’t wait for such resources anyway before seeking feedback • We can structure the procedure and choose the surrogate users cleverly to make quicker and easier tests that are still informative • aka “discount usability testing”
Some test types in more detail Heuristic evaluation Talk aloud protocol Cognitive walkthrough (Wizard of Oz)
Heuristic Evaluation • Proposed by Nielsen and Molich. • usability criteria (heuristics) are identified • design examined by experts to see if these are violated
Heuristic Evaluation • Rank by severity • 0=no usability problem • 1=cosmetic – fix if have extra time • 2=minor – fixing is low priority • 3=major – important to fix • 4=usability catastrophe – imperative to fix (a little counter-intuitive – low is good, like in golf) • Heuristics, particularly the10 from Nielsen • Visibility of system status, Match between system and real world, User control and freedom, etc. • Nielsen’s 10 embody a wealth of applied cognitive science and experience • But, procedurally, heuristic evaluation could be done with any set of principles or standards • Heuristic evaluation ‘debugs’ designs
Nielsen’s 10 • Visibility of system status • Match between system and real world • User control and freedom • Consistency and standards • Error prevention • Recognition rather than recall • Flexibility and efficiency of use • Aesthetic and minimalist design • Help users recognize, diagnose and recover from errors • Help and documentation
Nielsen’s 10 in depth • Visibility of system status: The system should always keep users informed about what is going on, through appropriate feedback within reasonable time. • Match between system and the real world: The system should speak the users' language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order. • User control and freedom: Users often choose system functions by mistake and will need a clearly marked "emergency exit" to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.
Nielsen’s 10 in depth • Consistency and standards: Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions. • Error prevention: Even better than good error messages is a careful design which prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action. • Recognition rather than recall: Minimize the user's memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.
Nielsen’s 10 in depth • Flexibility and efficiency of use: Accelerators -- unseen by the novice user -- may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions. • Aesthetic and minimalist design: Dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility.
Nielsen’s 10 in depth • Help users recognize, diagnose, and recover from errors: Error messages should be expressed in plain language (no codes*), precisely indicate the problem, and constructively suggest a solution. • Help and documentation: Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user's task, list concrete steps to be carried out, and not be too large. * If you have a good relationship between end-user and help desk, and somewhat technical users, an error code in addition to your error message may have a useful role (“Hey Bill, I’m getting that error 319 again.”)
When to use Heuristic Evaluation • Particular advantage that it can be used very early • From first sketches and outline descriptions • May head off a mistake rather than having to fix it • Called a ‘discount usability’ method, because it’s relatively cheap (doesn’t require a lot of time and effort)
Talk aloud protocol (aka ‘protocol analysis’) • Have a user try the system and speak out loud about what they’re doing • Not entirely natural (except for really annoying people!), but tends to work OK • Note misconceptions • Is what the user says they’re doing, not actually what they’re doing • Note vocabulary • What is the user calling objects and actions; should screen instructions be adjusted to align? • Note problems • Most important to note when they go astray (fail to achieve their goal and/or get an error message)
Talk aloud (contd.) • Note aspirations • The user might spontaneously say what they wish they could do? • Is that feature there, but just not obvious to the user? • Is it a feature that could reasonably be added • Variations for two • Have two users sit side-by-side at the computer and perform the task together • Then it becomes much more natural for them to ‘talk aloud’ • Potentially get extra rich insights if the users disagree about how to do something • More user input per unit time for the testers
Cognitive Walkthrough (aka Wizard of Oz) Proposed by Polson et al. 1992 • evaluates design on how well it supports user in learning task • usually performed by expert in cognitive psychology • expert ‘walks through’ design to identify potential problems using psychological principles • Based on the idea of a code walkthrough in conventional code testing • forms used to guide analysis • can be used to compare alternatives (A bit like a formalised version of the Talk Aloud Protocol, but generally an expert rather than a typical user is ‘driving’)
Cognitive Walkthrough (ctd) • For each task walkthrough considers • what impact will interaction have on user? • what cognitive processes are required? • what learning problems may occur? • Analysis focuses on goals and knowledge: does the design lead the user to generate the correct goals?
Pen-based interface for LIDS • UA1: Press look up button • SD1: Scroll viewpoint up • UA2: Press steering wheel to drive forwards • SD2: Move viewpoint forwards • UA3: Press look down button • SD3: Scroll viewpoint down UA = User Action SD = System Display
Pen interface walkthrough • UA 1: Press look up button • Is the effect of the action the same as the user’s goal at this point? Up button scrolls viewpoint upwards. So, it’s immediately rewarding with respect to that goal. • Will users see that the action is available? Yes. The up button is visible in the UI panel. • Once users have found the correct action, will they know it is the one they need? There is a lever with up/down looking symbols as well as the shape above and below the word look. The user will probably select the right action. • After the action is taken, will users understand the feedback they get? The scrolled viewpoint mimics the effect of looking up inside the game environment.
Cognitive walkthrough results • Fill out a form • Track time/date of walkthrough, who the evaluators were • For each Action, answer the four pro forma questions (as per previous slide) • Any negative answer to any question should be documented on a separate Problem Sheet, indicating how severe the evaluators think the problem is, and whether they think it’ll occur often
When to do a Cognitive Walkthrough • Can be done at any stage in the development process once you have a ‘prototype’ or actual system implementation to work on • Can be done with paper prototype • ‘Wizard of Oz’ name is reference to ‘Pay no attention to that man behind the curtain’ – a person plays the role of the yet-to-be-written software system • Can be done with a shrink-wrapped product • Focus on key tasks • Things that are done by most users • “Critical success factors” of the system • Consider something that matches the name of the product • If it’s an email client, do a cognitive walkthrough of the task of writing and sending an email (not of the user login dialogue)
‘Real users’ and statistics • What’s a ‘real user’ anyway? • In lecture 4 we said Participants should be ‘real users’ and that they should be “actualusers who are asked to perform realistic and representative tasks using a proposed design” • The idea is to get the reaction of people most like your future users • This provides the most valid feedback for most usability test designs • Discount usability methods explicitly contradict this representativeness • Heuristic evaluation and cognitive walkthrough are intended to be done by ‘experts’ (hard to say who an ‘expert’ is, but it’s different than aiming for typical users) • Talk-aloud could be representative users, but doesn’t have to be
‘Real users’ and statistics (contd.) • For statistical validity you want a random sample of your future user base • So, actually not all perfectly average users, but a sample proportionally representing the range of relevant user attributes • e.g. skills and experience • The accepted way to do this is to sample randomly from a population • all potential participants have an equal probability to be selected to participate; but it’s confounding if some people decline to participate [maybe the busiest and most talented ones!] • Actually need a really large sample to support statistical inferences (e.g. 95% confidence interval of estimated mean task time)
Real users and statistics (contd.) • In discount usability testing we don’t pretend to have a random sample or adequate sample size for reliable quantitative estimates • But qualitative data is valid • If one person is observed to make an error a certain way, then another could • If one person thinks your word (or font, or colour) choice is bad or inconsistent, another could too
Relationship of Usability Testing andUI Design • General maxim • “Test early and often” • It’s cheaper to learn about problem sooner in the development process than later • Fits many parts of development lifecycle • Discovery • Usability testing of the existing system(s) to understand opportunities / define requirements • Design • Evaluate design (esp. lo-fi / paper prototypes) and iteratively refine based on feedback • Pre and post deployment • Make sure you’ve got it right, set agenda for perfective maintenance
Relationship to design (contd.) • Increasing need for evaluation of matured, deployed products • There’s a lot of software out there • Almost always smarter to buy than to build if there’s something already written • Evaluate potential products (possibly at a friend site that’s already deployed) • The backers of matured products have big budgets to stay on top • Imagine the usability testing budgets of Amazon, Google, Apple, Nokia
Usability testing and research • Usability testing can also be applied to more basic research • Answer questions of relative performance of UI options in particular contexts • E.g. Determine whether circular menus outperform vertical rectangular menus for a given selection task • Test the usability of novel UI devices • How do people react to a force feedback control for a virtual world navigation task? • Teach us about human beings per se • E.g. How do people react to different colours for highlighting of search targets? • At this point usability testing merges with psychology and human physiology research
Example Your usability group has been approached to evaluate a new eBay/Trademe type service. It’s been big in Korea, and they have a prototype version that’s been converted for the New Zealand market. How do you propose to assess it?
Answer part 1 • Phase 1. Try some discount usability assessment • Devise some key tasks (e.g. 1. posting something for sale, and 2. bidding on something) • Try a talk-aloud protocol with whoever you can grab (will probably uncover any gross problems) • Try heuristic evaluation on Nielsen’s 10 with maybe 3-5 people • Probably try a cognitive walkthrough • Iterate on this with design/development team until it’s looking pretty good
Answer part 2 • Phase 2. Decide the most key success factors to be assessed in a usability test • E.g. 1. people don’t make mistakes that impact their intent re purchase and sale and 2. they like it • Recruit a group of users (offer them a movie voucher or such) • Design protocol (maybe as bidder and then as seller), probably no ‘training’ – just orient them to their role and sit them at the computer • Might be wise to randomise users to the new system or Trademe (the chief competitor – see if this system has an ‘edge’) • Carryover effects (boredom, practice) would probably be too strong to allow a repeated measures design, alas
Answer part 3 • Set your measures • Formalise what you mean by a ‘serious error’ • Set up a satisfaction questionnaire (to get qualitative feedback, but with a couple Likert scale questions that define the satisfaction score) • Measure task time, too • Might make a learnability/time composite objective like “Can 90% of users successfully place a bid within 10 minutes of effort after having spent 10 minutes browsing for-sale postings?”
Answer part 4 • After the study • Explore video/screen log of every serious error • Present the profile of errors, satisfaction and task time (along with summary of qualitative feedback) to your client • Do they have a usability edge on TradeMe? (or at least come up similar if they have some other advantage that will attract users) • If not, they better re-think their market entry, or go for a significant UI redesign