580 likes | 795 Views
SEG 3210 User Interface Design & Implementation. Prof. Dr.-Ing. Abdulmotaleb El Saddik University of Ottawa (SITE 5-037) (613) 562-5800 x 6277 elsaddik @ site.uottawa.ca abed @ mcrlab.uottawa.ca http://www.site.uottawa.ca/~elsaddik/. Unit B : UI Evaluation.
E N D
SEG 3210User Interface Design & Implementation Prof. Dr.-Ing. Abdulmotaleb El Saddik University of Ottawa (SITE 5-037) (613) 562-5800 x 6277 elsaddik @ site.uottawa.ca abed @ mcrlab.uottawa.ca http://www.site.uottawa.ca/~elsaddik/
Unit B: UI Evaluation • Objectives of User Interface Evaluation • Where Evaluation Fits in the Development Process • A Preliminary Case Study: Hotel Reservations • Overview of Interface Evaluation Methods • Details: Heuristic Evaluation • Malfunction Analysis • Details: Videotaped Evaluation • Details: Experiments • Details: Usability Engineering • Details: Cognitive Walkthroughs • Key Points to Review
1. Objectives of User Interface Evaluation • Key objective of both UI design and evaluation: • Minimize malfunctions • Key reason for focusing on evaluation: • Without it, the designer would be working “blindfold” • Designers wouldn’t really know whether they are solving customer’s problems in the most productive way
1. Objectives of User Interface Evaluation • Questions answered by various evaluation techniques: • What is the user’s real task? • Prevent later malfunctions • by doing evaluation as part of requirements analysis • Present and work with a UI • to help formulate the requirements • Inappropriate tasks/requirements are a major source of malfunctions • What problems do or might users experience with the UI? • Directly find malfunctions • Which of several alternative UI’s is better? • Pick the version that leads to fewer malfunctions • Has the UI met usability targets? • Ensure that malfunction counts are sufficiently low • Does the UI conform to standards? • Leverage of collective wisdom to reduce malfunctions
1. Objectives of User Interface Evaluation • But, in order for evaluation to give feedback to designers... • ...we must understand why a malfunction occurs • Malfunction analysis: • Determine why a malfunction occurs • Determine how to eliminate malfunctions We will discuss this while working through this unit.
2. Where Evaluation Fits in the Development Process • Throughout the lifecycle! • During rough sketching or prototyping • During iterative design • The more evaluation the better • Especially when users are involved • Formative evaluation: • When designing and maintaining software that we are developing • Summative evaluation: • When judging a finished product developed by someone else
3. A Preliminary Case Study: Hotel Reservations • UI Evaluation performed for Forte Travelodge Performed in a special usability lab • Aims: • Identify and eliminate malfunctions • Hence make system easier to use • Avoid business difficulties caused by these malfunctions • Develop improved training material and documentation • Avert potential malfunctions by teaching users how to avoid them • Setup of IBM usability lab: • Resembles TV studio • Microphones and video equipment • One way mirror • Technicians, observers sit on one side • Users sit on other side in realistic environment • User environment resembles reception desk • Non-threatening
3. A Preliminary Case Study: Hotel Reservations • Aspects of system to be evaluated: • How quickly can a booking be made? • (while operator is on telephone) • Is each screen productive to use? • Are help and error messages effective? • Can non-computer-literate operators use the system? • Is complexity minimized? • Is training and documentation effective?
3. A Preliminary Case Study: Hotel Reservations • Procedure: • 15 common task scenarios developed: • Among others: basic registration, cancellation, request for specific room, extension of existing stay etc. • Four days of testing with multiple users performing various sets of tasks • Users were told evaluation is of system, not them • All actions were recorded • Debriefing sessions held • Videos then analyzed for malfunctions • 62 identified • Priorities: • Navigation speed needs improvement • Screen titles and formats need tuning • Hard to refer to documentation • Physical difficulties with telephone headsets and furniture
3. A Preliminary Case Study: Hotel Reservations • Results: • Higher productivity of booking staff • tasks completed more quickly • guest requirements better met • Training costs kept low • Morale kept high • More customers booked by phone • 14500 27000 per week
4. Overview of Interface Evaluation Methods • Three types of methods • Passive evaluation • Active evaluation • Predictive evaluation / usability inspections • All types of methods useful for optimal results • Used in parallel • All attempt to prevent malfunctions • Before trying methods, do pilot studies first
4. Passive evaluation • Usage of software is monitored • Performed while prototyping, in alpha test and later • Does not actively seek malfunctions • only finds them when they happen to occur infrequent (but possibly severe) malfunctions may not be found • Generally requires realistic use of a system • Users become frustrated with malfunctions • Gathering Information: • Problem report monitoring: • Users should have an easy way to register their frustration / suggestions • Best if integrated with software
4. Passive evaluation: Gathering Information • Automatic software logs • Can gather much data about usage • command frequency • error frequency and pre-error patterns • undone operations (a sign of malfunctions) • Privacy is a concern • System must be designed for testability (DFT) • Logs can be taken of: • just keystrokes, mouse clicks • full details of interaction • The latter make accurate playback easier
4. Passive evaluation: Gathering Information • Questionnaires / surveys • Useful to obtain statistical data from large numbers of users • Proper statistical means are needed to analyze results • Gathers subjective data about importance of malfunction • automated logs omit importance • less frequent malfunctions may be more important • users can prioritize needed improvements • Limit on number of questions • Very hard to phrase questions well • Questions can be closed- or open-ended
4. Active evaluation • Actively study specific activities performed by users • Performed when prototyping and later • Gathering Information: • Experiments & usability engineering • Prove hypotheses about measurable attributes of one or more UI’s • e.g. speed/learning/accuracy/frustration • In usability engineering test against preset targets • Can be expensive • Knowledge of statistics needed • Hard to control for all variables • Observation sessions • Also called ‘interpretive evaluation’ • Simple observation or cooperative evaluation • Described in detail later
4. Predictive evaluation • Studies of system by experts rather than users • Performed when UI is specified and later • useful even before prototype developed • Can eliminate many malfunctions before users ever see software • Also called ‘usability inspections’ • Gathering Information: • Heuristic evaluation • Based on a UI design principle document • Analyze whether each guideline is adhered to in the context of the task and users • Can also look at adherence to standards • Cognitive walkthroughs • Step-by-step analysis of: • steps in task being performed • goals users form to perform these tasks • how system leads user through tasks
Comparison of key questions to evaluation techniques • ++: Very good techniques • + : OK technique • 0 : possible technique
5. Details: Heuristic Evaluation • A type of predictive evaluation • Use HCI experts as reviewers instead of users • Benefits of predictive evaluation: • The experts know what problems to look for • Can be done before system is built • Experts give prescriptive feedback • Important points about predictive evaluation: • Reviewers should be independent of designers • Reviewers should have experience in both the application domain and HCI • Include several experts to avoid bias • Experts must know classes of users • Beware: Novices can do some very bizarre things that experts may not anticipate
5. Details: Heuristic Evaluation • Planning for heuristic evaluation • Based on UI guidelines (heuristics about what is best) • Multiple passes needed • one pass to look for each kind of problem • passes to follow different routes through screens and dialogues (i.e. different tasks) • 1-2 hour sessions is good • 1 expert evaluator finds only 33% of problems • 5 evaluators needed to find 75% of problems • 15 more to find about 99% • Example heuristics for heuristic evaluation (Many more to be covered later in course) • Use simple and natural dialogue • Speak the user’s language • Minimize memory load • Be consistent • Provide feedback • Provide clearly marked exits • Provide shortcuts • Provide good error messages • Prevent errors
6. Malfunction Analysis • A disciplined approach to analyzing malfunctions • Provides feedback into the redesign process • Play protocol, searching for malfunctions • Answer four distinct questions: • Q1. How is the malfunction manifested? • What do you notice and who noticed it? • Q2. At what stage in the interaction is it occurring? • Goal forming, action decision, action execution, interpretation of results • Q3. At what level of the user interface is it occurring? • Physical element level to task level • Q4. Why is it occurring? • What is its root cause • List and prioritize possible cures
Q1. How is the malfunction manifested? • a) Malfunctions detected by the system (easiest to detect) • omission of an argument • incorrect date format • Cure: • Better prompts, consistency, visible examples, more forgiving of alternatives • b) Malfunctions detected by the user during operation • taking wrong path in menu hierarchy • not finding required help • not being able to perform a certain action • not being able to tell which state system is in • Cure: • Improve functionality, feedback, clarity, simplicity
Q1. How is the malfunction manifested? • c) Malfunctions undetected (until later) • output produced is wrong due to wrong inputs • unnecessary work performed • Cure: • Improve feedback indicating consequences of input; simplify • d) Inefficiencies • excessive response time • excessive think time • unnecessarily long command sequences • unnecessary repetitions • complex operations that require use of reference • Cure: • Simplify, speed system up
Q2. What Stage in the Interaction the Malfunction Occur? • a) When the user decides on next goal (Forms an intent to do inappropriate thing) • decides to empty a field because user thinks it is unimportant (when it is important) • decides to charge default exchange rate (when should obtain current exchange rate) • Cure: • Lead user through task better; better feedback; better training • b) When the user specifies the action (Action does not match the goal) • deletes the record instead of emptying a field • charge reciprocal of exchange rate • Cure: • Improve clarity, feedback, prompts, conceptual model
Q2. What Stage in the Interaction the Malfunction Occur? • c) When the system executes the action • Defects in functionality • Cure: • Fix functionality in normal way • d) When the user interprets the resulting system state • thinks bank account has been debited when it has not • thinks system has ‘hung’ when it has not • thinks some data must be entered when it is the default • cannot understand resulting error message • Cure: • Better feedback, better conceptual model
Q3. At Which Level Does the Malfunction Occur? • a) Task level (Task and goals not supported) • What the user wants to do cannot be done by the system • Functionality is not provided • Cure: • Add functionality • b) Conceptual level (User has wrong mental model; does not understand intended conceptual model) • thinks that money is being deducted from bank account when it is being charged to a credit card • thinks that dragging a file to the desktop means they are no longer on the disk • thinks that dragging a disk to the trash can icon deletes disk contents • Cure: • make conceptual model clearer; improve metaphors
Q3. At Which Level Does the Malfunction Occur? • c) Interaction style level (system wide problem) • does not know how to pull down a menu • scrolls a page instead of a line • goes to next screen instead of scrolling • retypes command after an error instead of editing it • Cure: • make operation of the interface more intuitive and consistent • d) Interaction element level (specific detail inappropriate) • selects wrong button because label is misinterpreted • specifies invalid command syntax • specifies wrong code for option • Cure: • More attention to details of the interface, simplification
Q3. At Which Level Does the Malfunction Occur? • e) Physical element level (Physical execution incorrect) • presses wrong key accidentally • clicks on wrong pixel in image • out-types machine (actions lost) • types ahead when system is computing; keystrokes later applied to wrong action • Cure: • Defenses to protect user from consequences; better hardware design; fix bugs in code
Q4. Why Does the Malfunction Occur? • Lack of (on the part of the user): • Motivation: • Poor job satisfaction • Attention: • User is pre-occupied with other things. • Input information processing: • No feedback provided to tell user what is going on • or cues provided by the system are not recognized • or cues are misinterpreted Cures: Clearer, more consistent feedback • Discrimination: • user is unable to tell certain things apart • e.g. red/green colour discrimination • e.g. two icons that are similar Cures: Improved expression of information
Q4. Why Does the Malfunction Occur? • Physical coordination: • e.g. wrong item selected because of difficulty positioning cursor with mouse. Cures: Alternate interaction mechanisms, better feedback • Recall: • User did not remember command , syntax etc. Cures: Better mnemonics, online help, quick lookup mechanisms, command completion • Knowledge / lack of learning: • User does not have business or software knowledge to make right choice.
Q4. Why Does the Malfunction Occur? • Learning difficulties that cause malfunctions: • Learning is difficult • users get frustrated • learning takes time; can be hard to apply • Learners make ad-hoc interpretations • they may not recognize their problem • they may falsely think they have a problem • Learners generalize from what they know • they assume computers work like manual methods • they assume consistency • Learners have trouble following directions • they often ignore them even if they see them • they do not easily understand them
Q4. Why Does the Malfunction Occur? • Problems and features interact • they do not see that one problem can cause another • Prerequisites and side-effects confuse learners • Help facilities do not always help • they do not know what to ask for • too much detail is often provided • Other causes of malfunctions: • Excessive resource demands • External events (e.g. noise) • Misleading or inadequate training • Unrealistic task definitions • Intrinsic human variability
7. Details: Videotaped Evaluation • A software engineer studies users who are actively using the user interface • To observe what problems they have • Rather than to measure numbers • The sessions are videotaped • Can be done in user’s environment • Activities of the user: • Performs pre-defined tasks • With or without detailed instructions on how to perform them • Preferably talks to herself as if alone in a room • Yields ‘think-aloud protocol’ • This process is called ‘co-operative’ evaluation when the software engineering and user talk to each other
7. Details: Videotaped Evaluation • The importance of video: • Without it, ‘you see what you want to see’ • You interpret what you see based on your mental model • In the ‘heat of the moment’ you miss many things • Minor details (e.g. body language) captured • You can repeatedly analyze, looking for different problems • Tips for using video: • Several cameras are useful • Software is available to help analyse video by dividing into segments and labeling the segments • Evaluation can be time consuming so plan it carefully
Steps for videotaped evaluation • Select 6 to 8 representative users per user class • E.g. client, salesperson, manager, accounts receivable • Invite them to individual sessions • Sessions should last 30-90 minutes • Schedule 4-6 per day • If system involves user's clients in the interaction: • Have users bring important clients • or have staff pretend to be clients • Select facilitators/observers and notetakers • Prepare tasks: • Select the most commonly used tasks plus a few less important tasks • Write task instructions for users • Estimate the time it will take to complete each task plus extra time for discussion • Prepare notebook or form for organizing notes
Steps for videotaped evaluation • Set up and test equipment • Hardware on which to run system • Audio or video recorder • Software logs • Do a dry run (pilot study)! • At the Start of an Observation Session • explain: • nature of project • anticipated user contributions • why user's views are important • focus is on evaluating the user interface, not evaluating the user • all notes, logs, etc., are confidential • user can withdraw at any time • usage of devices • relax! • Sign informed consent form: • very important
Steps for videotaped evaluation • Start user verbalizing as they perform each task (thinking aloud) • For co-operative evaluation, software engineer also verbalizes • Appropriate questions to be posed by the observing software engineer:
Steps for videotaped evaluation • Hold a wrap-up interview (de-briefing) • What were the most significant problems? • What was most difficult to learn? • Etc. • Analyze the videotape to find malfunctions • Lab exercise: • Videotaped evaluation of a software product
8. Details: Experiments (Details in Experimentation) • Pick a set of subjects (users) • A good mix to avoid biases • A sufficient number to get statistical significance (avoid random happenings effect results) • Pick variables to test • Independent: Manipulated to produce different conditions • Should not have too many • They should not affect each other too much • Make sure there are no hidden variables • Dependent: Measured value affected by independent • Develop a hypothesis • A prediction of the outcome • Aim of experiment is to show this is correct • E.g. Some change in an independent variable causes some change in a dependent variable
8. Details: Experiments (Details in Experimentation) • Design experiments to test hypotheses • Create a null (inverse) hypothesis • e. a change in independent variable causes no change in dependent variable • Disprove null hypothesis! • Experiment design is difficult. • Conduct experiments • Statistically analyze results to draw conclusions • e.g. using ‘t-tests’ • conclusions will be correct within a margin of error 19 times out of 20 • Decide what action to take based on conclusions
Two major types of UI experimentation • Traditional • Micro level • (e.g. testing which colour is best for a certain icon) • Only done when usage will be very high or you are a university researcher! • Usability engineering • Tests significant part of system • Relaxes ‘scientific constraints’ • Because we cannot control all variables • Used to prove hypotheses that certain usability goals have been met • More later • You should understand experimentation • Much UI research is experimental • You have to interpret and apply others’ results
Example Experiment: Text Selection Schemes • Early GUI research at Xerox on the Star Workstation • Traditional experiments • Results were used to develop Macintosh • Goal of study: • Evaluate how to select text using the mouse • Steps: • Subjects • Six groups of four • In each group, only two are experienced in mouse usage
Example Experiment: Text Selection Schemes • Variables • Independent: • Selection schemes: • 6 strategically chosen patterns involving • --> Which mouse button (if any) could be double/triple/quad clicked to select character/word/sentence • --> Which mouse button could be dragged through text • --> Which mouse button could adjust the start/end of a selection • Dependent • Selection time • Selection errors • Hypothesis • Some scheme is better than all others
Example Experiment: Text Selection Schemes • Detailed experiment design • Null hypothesis: No difference in schemes • Assign a selection scheme to each group • Train the group in their scheme • Measure task time and errors • Each subject repeated 6 times • A total of 24 tests per scheme • Conduct Experiment • Analysis • F-test used - scheme F found to be significantly better • Point and draw through with left mouse • Adjust with middle mouse • Action • Try another combination similar to scheme F • Left mouse can be double-clicked
Questions to ask when reviewing experiments • Not all published experiments are done well! • Were users adequately prepared? • Were tasks complex enough to allow adequate evaluation? • Did the task become boring to the users? • Although effects are found to be statistically significant, does that matter? • Maybe not if a particular task is rarely performed • Are there any other possible interpretations? • Maybe users have learned to do better at task B because they did task A first! • Are dependent variables consistent? • e.g. users may prefer slower method • Can results be generalized? • Maybe selection results also apply to graphics, maybe not
9. Details: Usability Engineering “A process whereby the usability of a product is specified quantitatively, and in advance. Then as the product is built it can be demonstrated that it does or does not reach the required levels of usability” • Partly engineering: • Design-evaluate-redesign • Partly science: • Experimentation methodology • Although not with full controls
Usability Engineering Steps • Pick ‘benchmark’ tasks • Simple tasks that can be repeated and performance measured • Pick usability metrics • Set planned levels of usability • Design initial interface using usability guidelines • Analyze impact of design(s) using experiments • i.e. a new batch of users is measured running the benchmark tasks • If goals are achieved, stop • Incorporate user derived feedback in design • Go back to step 5 • Major problem with usability engineering: • Benchmark tasks are rarely performed in a truly natural environment
Some suitable usability metrics • Time to complete task • Percentage of task completed per unit time • Ratio of successes to failures • Percentage of time spent dealing with errors • Percentage of competing products that have better speed measures than our product • Number of repetitions of failed commands • Percentage of available commands used • Number of times user had to undo an action • Number of unnoticed errors • Number of times user did not use the expected method to accomplish the task • Think time required for task • i.e. ignoring system response time • a good UI should lead user through system with minimal ‘cognitive load’
10. Details: Cognitive Walkthroughs • A form of predictive evaluation • Detailed reviews based on psychological theory, focusing on: • Goals a new user must form to execute a task • How well the system leads the user to form those goals • i.e. how well the system supports the user • The method is highly structured • Forms are provided to guide the evaluator • More time consuming that ordinary heuristic evaluation • Less time consuming than experiments
Cognitive Walkthrough Steps • Choose a task to evaluate • Describe the task exactly • First describe the task in one sentence • Use simple language • The wording should be from a first-time user’s point of view e.g. Record a newly-received item in inventory. • Describe the initial state of the system e.g. Main menu is displayed • List the atomic actions needed to correctly perform the task, e.g. • Click on ‘add to inventory’ in the menu. • If you don’t know the part number, hit ‘return’ to perform look up the part number, then go to action 4. • Type the part number into the ‘part number’ field • Press tab • Type the number of items in the ‘Number’ field • Hit <return> or click on ‘Add’. • If the system prints out a bar-code sticker, affix it to the new item. • Describe classes of users who may perform the task e.g. Receiver - knows about inventory, but not yet about the system