470 likes | 586 Views
Evaluating interaction design. IMD07101: Introduction to Human Computer Interaction Brian Davison 2010/11 With material from Susan Turner. About evaluation. What it is Why and when to do it How to do it Heuristic (expert) evaluation Setting and using usability metrics
E N D
Evaluating interaction design IMD07101: Introduction to Human Computer Interaction Brian Davison 2010/11 With material from Susan Turner
About evaluation • What it is • Why and when to do it • How to do it • Heuristic (expert) evaluation • Setting and using usability metrics • Usability testing with users • What else you might do, given more resources
What it is • A review of the design or developing application to test whether it meets user needs Selected design Alternative designs Analysis of requirements Finished software ?
When • We should test ideas right from the earliest stages: here are just some possibilities • Reviewing ideas with clients (the pitch) • Exploring concepts, sketches and early prototypes with users • Expert review of the developing design • User testing of a near-complete prototype • Data from all these feed back into the design itself • Software development lifecycle • Different models • Describe the process of creating a software system from beginning to end
Requirements specification Prototyping Evaluation The Star lifecycle Task/functional analysis Implementation Conceptual/ formal design (adapted from Hix and Hartson, 1993)
Why evaluate • Feedback for design and development (usually) • Information for making design choices (frequently) • Evidence of usability for prospective customers (sometimes) • As a means of involving users (when we are designing for a small, specific group) • As part of quality procedures (in large companies, e.g. Nokia) • Research (as here at Napier, e.g. the Companions project currently)
How will you know if your design is successful? • Show it to your friends and ask them what they think? • Have a good look at it yourselves and try to spot problems? • Show it to your Mum and ask her what she thinks? • Have people try it out?
Types of requirement Functional Non-functional Other desirable characteristics eg Speed Reliability Accessibility Usability Cost Aesthetics • Directly related to the functions of the software • eg • Format a particular type of document • Send a particular type of message • Control a particular piece of equipment
Appropriate tests • Can we take a simple measurement? • Functions: does the software perform the function successfully? • Speed: does the system reach the agreed target performance? • Reliability: does the system work without errors for the target period? • Select an appropriate criterion (plural = criteria) • May be a metric = something that can be measured • What about usability? • No simple criteria available • No direct measurements possible • May involve opinion (whose?) • Not just ‘easy to use’ but fit with people’s life, work, aspirations and wishes
Requirements • Requirements analysis • eg. if most people interviewed want an online supermarket site to be secure, quick to use and easy to navigate, that’s what we should test • But the criteria for a new phone might be very different, and perhaps take in aspects such as the desired image for its owner • Context of use • Requires different priorities and different criteria
Context example 1 • A call centre operator has the job of answering customer enquiries about insurance claims. • This involves talking to customers on the phone while accessing their personal data and claim details from a database. • You are responsible for the evaluation of some new database software to be used by the operators. • What aspects of usability do you think it is important to evaluate, and how might you test them?
Context example 2 • An interactive website is designed to be an online art gallery. • The designers want to allow people to experience concrete and conceptual artworks presented in different media. • What aspects of usability do you think it is important to evaluate, and how might you test them?
Categories of usability test • Experts • Have professional experience • Have theoretical knowledge • Are familiar with techniques that safeguard usability • Small numbers needed • Users • Ultimately, only the users’ opinion matters • May have characteristics that were unknown to the designers • Large numbers needed
Expert testing: Heuristic evaluation • A method where an analyst finds usability problems by checking the user interface against a set of supplied heuristics or guidelines (Lavery, Cockton and Atkinson, 1996) • Heuristic = problem solving that proceeds by trial and error • Related to Greek ‘eureka’ • Contrast with an algorithmic approach • More simply, checking the design against guidelines and good practice
Who should do heuristic evaluation? • More than one evaluator • Should not be the designer • You know how it’s meant to work • You are biased in favour of the design • Ideally should be a usability specialist • … or someone who has done an HCI module equipped with a decent set of guidelines • Each carries out independent inspection, then consolidate findings • Evaluators may need help • Unless ‘walk-up-and-use’ application • Could provide typical usage scenario
Jakob Nielsen • Widely cited and respected as a usability guru • Although some experts disagree, arguing his approach is too rigid • But his guidelines have evolved over time and make a good starting point for web and other applications • www.useit.com/papers/heuristic/heuristic_list.html • There are many other sets of guidelines
Nielsen’s 10 usability heuristics • Visibility of system status • Match between system and the real world • User control and freedom • Consistency and standards • Error prevention • Recognition rather than recall • Flexibility and efficiency of use • Aesthetic and minimalist design • Help users recognize, diagnose, and recover from errors • Help and documentation
In more detail… (1-3) • Visibility of system status • The system should always keep users informed about what is going on, through appropriate feedback within reasonable time. • Match between system and the real world • The system should speak the users' language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order. • User control and freedom • Users often choose system functions by mistake and will need a clearly marked "emergency exit" to leave the unwanted state without having to go through an extended dialogue. Support undo and redo.
In more detail… (4-6) • Consistency and standards • Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow well-known conventions. • Error prevention • Even better than good error messages is a careful design which prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action. • Recognition rather than recall • Minimize the user's memory load by making objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate.
In more detail… (7-9) • Flexibility and efficiency of use • Accelerators -- unseen by the novice user -- may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions. • Aesthetic and minimalist design • Dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility. • Help users recognize, diagnose, and recover from errors • Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.
In more detail… (10) • Help and documentation • Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. • Any such information should be easy to search, focused on the user's task, list concrete steps to be carried out, and not be too large
How many evaluators? 3-4 evaluators usually find the majority of problems (after Nielsen, 1993)
Step by step • Establish aims, intended users and context of use • Look back at the user requirements • Select heuristics • Brief evaluators about the technology and intended use • Scenarios are useful here • Evaluators independently make a first pass through the user interface • Look at both typical and critical ‘tasks’ (what a user needs to do) • Review against the heuristics • Produce a consolidated list of prioritised problems • Linked to heuristics and suggested fixes
Limitations of expert testing • Not all user problems can be predicted • Particularly those deep in a sequence of actions • Some ‘problems’ found by experts don’t really cause difficulties • People have common sense and experience • Use the results of expert evaluation to focus testing with real users
User testing for different purposes • Very often we just need to identify problem areas or overall reactions to technology • Qualitative rather than quantitative information • Observation of user reactions & behaviour, interviews and post-use questionnaires • But sometimes more precise data is needed, e.g. • Comparing versions or products • Demonstrating we have met client requirements • Relative ease of use of different input devices • Here usability metrics may help
Typical metrics • Learnability • time to reach specified level of proficiency • e.g. complete a given task • But learning is a continuum • Memorability • test users on functions after trial session • Errors • number of errors in completing specified task • Subjective satisfaction • rating scales • physiological measures • Efficiency • times for expert users to complete specified task(s) • frequency of ‘non-productive’ actions • ratio of used to unused commands
But how to set the target level... • Skill and intuition... • Better than last version? • Better than the competition? • Client targets? • Just because you can measure it doesn’t mean you should... • More resources required for planning and testing • And does it really mean anything… • Testing to obtain quantitative data tends to be in highly artificial contexts
Who to involve? • Test users should represent target users as closely as possible • May need to give basic training • May have to be paid in real-world circumstances • If your test users are unavoidably dissimilar to your target group, acknowledge the limitations of what you can conclude • Subjects: the participants in the tests • Sample: the group of subjects – characteristics, etc. • Population: the group that the sample represents
How many users? • There are no hard and fast rules but … it depends on … • The purpose of the evaluation • What you want to do with the data • The nature of the data itself and • Any safety critical / performance issues • Their availability • But more is always good
Depends on the purpose • If it is for early feedback only (3-5 people) … • If it is the roll-out of a new enterprise-critical e-business site (tens of people)
pay-off ratio for user testing (after Nielsen, 1993)
Take care…. • More recent experience (e.g. at Microsoft) suggests variation in users may swamp results • More users will mean more reliable results, especially if quantitative metrics are used
Activities - designing test tasks • Start from scenarios • Design tasks which are representative & provide reasonable coverage • Should be do-able but not trivial • Provide a written task description • Level of detail here depends on what you are trying to evaluate - no point in providing step by step instructions on how to book a flight (for example) if you want to find out of people can do this without help • Present in increasing level of difficulty
Make test context realistic • Patterns of use, e.g. are people likely to be interrupted? • Availability of assistance – are other people around to help out if there are problems? • Is the technology single user or used with others? • What will be the physical set-up, e.g.– at a desk, on a small device, as a free-standing information kiosk, as a display in a car or other vehicle…? • Is it inside or outside, particularly noisy (or quiet), hot, humid or cold, are there unusual lighting conditions…; • Social norms, e.g. are users likely to feel time-pressured ?
Technology • Doesn’t matter for early evaluation of ideas • But final testing needs to be on realistic technology • Remember to switch off chat, screen savers, email alerts, etc.
Usability testing - procedure • Preparation - draw up a test plan • Introduction • Testing • Users carry out tasks • You observe them (and/or record the process, with permission) • Implement any metrics • Debriefing • Interviews, questionnaires if used • Consider open questions and rating scales • Also ask about the testing process • Write up quickly
Ethical considerations in testing • Be informative • Be reassuring • Be considerate • Respect privacy
Relative effectiveness 1 • Expert evaluations useful when resources limited, or for early design • Teams better than individuals • Techniques are complementary • Cost effectiveness similar
Relative effectiveness 2 • Survey of 103 experienced user-centred design practitioners conducted in 2000 (Vredenburg et al., 2002) • No reason to think the situation has changed since then • Around 40% of those surveyed used testing with users • Around 30% used ‘informal expert review’ • Around 15% ‘formal heuristic evaluation’ • These figures do not indicate where people used more than one technique.
Real world evaluation • How are applications and devices used for real, in the long-term • Consider using diaries, phone interviews… • How do they fit with other technologies people have? • What other factors affect use? • Example; evaluating usability of mobile phones - researchers investigated not just usability of handset and software but customers’ understanding of charging & how this affects usage
Going further • Eye tracking software and hardware • Monitors target of gaze on the screen • Becoming widely used in usability consultancy • Software recording screen activity and video of people’s reactions • Again widely used, being purchased here • Controlled experiments • Scientifically designed tests with statistical analysis, designed to gather reliable data on the effects of interaction designs • Physiological measures • Measuring heartrate, GSR… for monitoring emotional reactions, e.g. to VR simulations
And what about fun and pleasure? • Patrick Jordan’s book “Designing Pleasurable Products” has a useful questionnaire on evaluating how pleasant a product is • There are also various visual questionnaires which aim to do this • Open ended interviewing is another means of exploring affective reactions
In summary • Evaluation is an essential part of designing interactive technologies • Required at different stages of the lifecycle for different purposes • Brings the design-development cycle back to requirements • Choose techniques according to circumstances • Involving users is highly desirable • Requires planning and attention to detail