1 / 54

Basics of Evaluations and Experiments Spring 2019

Usability Engineering. Basics of Evaluations and Experiments Spring 2019. Basics of Evaluations and Experiments. Types of Usability Studies Measuring Aspects of Usability General Guidelines for Usability Evaluations and Experiments. The two Main Types of Usability Studies.

garciasusan
Download Presentation

Basics of Evaluations and Experiments Spring 2019

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Usability Engineering Basics of Evaluations and Experiments Spring 2019 Usability Engineering - Basics of Evaluations and Experiments

  2. Basics of Evaluations and Experiments • Types of Usability Studies • Measuring Aspects of Usability • General Guidelines for Usability Evaluations and Experiments Usability Engineering - Basics of Evaluations and Experiments

  3. The two Main Types of Usability Studies • Usability Evaluation - Comprehensive system study • Assessing the extent to which an interactive system is easy and pleasant to use. • Usability Experiment - Focused study on specific system features • Comparing at a detailed level which of two or more alternatives is best User Centered Expert based Model Based Usability Engineering - Basics of Evaluations and Experiments

  4. Usability Evaluation vs. Usability Experiment • Usability Evaluation • Formative: helps guide design • Early in design process • Few users • Identifies usability problems, incidents • Qualitative feedback from users • Usability Experiment • Summative: measures final results • Compares multiple UIs • Many users, strict protocol • Independent & dependent variables • Quantitative results, statistical significance Usability Engineering - Basics of Evaluations and Experiments

  5. Usability Evaluation Assesses the extent to which an interactive system is easyand pleasant to use Consists of methodologies for measuring the usability aspects of a system's UI and identifying specific problems • Objective : To improve a user interface • Output : Recommendations for improvement • User-centered evaluation • Usability Testing • Card Sorting • Expert-based evaluation • Usability Inspection Methods Heuristic Evaluation / Cognitive Walkthrough / Pluralistic Walkthrough Heuristic Walkthrough / Perspective-based Inspection / … • Model-based evaluation • GOMS Model (Goals, Operators, Methods, Selection rules) Usability Engineering - Basics of Evaluations and Experiments

  6. Usability Experiments • Focused study on specific system features • Test alternative hypotheses to determine if they result • in an improvement of the design, by investigating the • relationship between two or more variables • Independent variable: What you set • Always one / Manipulated by the researcher • Dependent variables: What you measure to determine the effect of… • Typically one or two / Produces measurable results • Experimental condition: Each of the distinct values of the independent variable for which the dependent variable is measured in order to carry out statistical tests or calculations  • Control condition: A standard (initial) value of the independent variable against which other conditions can be compared Usability Engineering - Basics of Evaluations and Experiments

  7. Usability Experiment Example • Experimental conditions: using the word processor with the button relocated at predetermined positions. • Control condition: using the word processor with the button at its originalposition • Independent variable: button position • Dependent variable: time to produce the same document for each position • Independent Variable Depended Variables • Control Condition • Experimental Conditions • Measured results for user X Determine if relocating a certain button on the GUI of a word processor will enable people to write their documents faster. Usability Engineering - Basics of Evaluations and Experiments

  8. Types of Usability Experiments (*)defined on next slides Usability Engineering - Basics of Evaluations and Experiments

  9. Interval Estimate of a Parameter • A continuousinterval (or a distinct range of values) used to to estimatea parameter V. Note: An interval estimate may or may notcontain V • Confidence Level of an interval estimate of a parameter: • The probability that the interval estimate will contain the parameter • Usually expressed as a percentage (e.g. 90%) • Indicateshow sure you can be (how strict you want to be) about your results • Confidence Interval • A specific interval estimate of a parameter given a confidence level for this estimate • For instance, one may wish to be 95% confident that V is contained in an interval estimate (confidence interval 95%) • Three most common confidence intervals used: 90%, 95%, and 99% Usability Engineering - Basics of Evaluations and Experiments

  10. Simple Usability Testing Experiments • Easy to prepare and take testing experiments • Used to clarify design options, improve effectiveness, findability etc. • Usually involve on-line user participation through dedicated websites • On-line: • Easy distribution to remote places • Largenumber of participants • Instant availability and update of results Usability Engineering - Basics of Evaluations and Experiments

  11. Some Simple On-line Usability Testing Experiments On-line: large number of participants / instant availability of results • Preference Test • Helps to confidentlychoosebetween design options • Five Second Test • Evaluates how well a UI or webpage design communicatesits purposeand content • Click Test • Measures how effective is a design at letting users accomplish an intended task • Navigation Test • Identifies findabilityand discoverability issues in a web design • Question Test • Gets feedback from real people for any desirable issue Usability Engineering - Basics of Evaluations and Experiments

  12. Simple On-line Testing Experiments • Preference Test (A/B test): Ask users to choose between two or more design alternatives based on a particular attribute • "Which design looks more trustworthy?” • (or simply ask them which they prefer overall). • Helps to confidentlychoosebetween design options Usability Engineering - Basics of Evaluations and Experiments

  13. Simple On-line Testing Experiments • Five Second Test : Shows a UI or web site to a user for just five seconds. After the five seconds is up the user is asked several questions about this design • Evaluates how well a page communicatesits purposeand content • “What do you think that page was about?” • “What did you like most about the design?” • “What did you like least about the design?” Usability Engineering - Basics of Evaluations and Experiments

  14. Simple On-line Testing Experiments • Click Test : Records where users click on a design. Users asked to follow instructions , such as: • "Where would you click to view your shopping cart?“ • "Where would you click to choose a template for your blog?" • Measures how effective is a design • at letting users accomplish an intended task. Usability Engineering - Basics of Evaluations and Experiments

  15. Simple On-line Testing Experiments • Navigation Test : Shows a “homepage” Instructs user to complete a specific task that involves navigation through several pages • Helps identify findabilityand discoverability issues in a design. Usability Engineering - Basics of Evaluations and Experiments

  16. Simple On-line Testing Experiments • Question Test : Shows an image and asks user to answer several questions. Users can continue to view the image with no time limit while answering questions. • Gets feedback from real people for any desirable issue Usability Engineering - Basics of Evaluations and Experiments

  17. Measuring Aspects of Usability • Proficiency (Δεξιότητα) • Learnability( Δυνατότητα εκμάθησης) • Efficiency( Αποδοτικότητα) • Memorability( Δυνατότητα απομνημόνευσης) • Error Handling • User Satisfaction Usability Engineering - Basics of Evaluations and Experiments

  18. Measuring Aspects of Usability • Proficiency • Proficient: adept, skilled, skillful, expert having great knowledge and experience in a trade or profession.  • (implies a thorough competence derived from training and practice) • The most fundamentalthing to measure • Given a predefined set of tasks, proficiency is measured in two ways: • Total time to complete a predefined set of tasks • Percentage of work completed in a specific time unit • Depends on the set of tasks chosen • Different sets of tasks give different proficiency measurements This is not an issue when measurements are used for comparing systems Usability Engineering - Basics of Evaluations and Experiments

  19. Measuring Aspects of Usability (cont’d) • Learnability • Proficiency of a novice user • Efficiency • Proficiency of an expert • Memorability • Proficiency after a period of non-use • Error Handling • Number of deviations from the ideal way to approach a task • Total amount of time spent away from the ideal way to perform a task • User Satisfaction • Subjective / Objective • Scores on a point scale (e.g. 1 to 5) Measured through proficiency Usability Engineering - Basics of Evaluations and Experiments

  20. Learnability • How long it takes for a novice (*)user to learn the basic functionality of a system so as to be able to do “useful work”on it. • Measured in two ways: • Time to reach a certain level of proficiency • Level of proficiency reached after a certain amount of time Learnability is higher when • Users transfer skills from existing knowledge • Standard windows manipulation / common accelerators Ctrl-Z, F1.. • Standard ways to fill in forms, use menus etc. • Information arranged the way it is arranged in the real world • Well designed GUIs, well chosen metaphors, WYSIWYG, etc. • Interfaces designed like -or close enough to- existing systems • Similarproducts on the market • Previous versions of same product • (*)to this particular system,not software or computers in general. Usability Engineering - Basics of Evaluations and Experiments

  21. Measuring Learnability - The Learning CurveProficiency over time Slope of early usage Initial Level 0 Time • Three major aspects of a learning curve to observe • Initial level: Can the novice user transfer skills from other systems? • Slope of early usage: Is the system quick to learn? • Eventual level achieved: Do experts use the system efficiently? Usability Engineering - Basics of Evaluations and Experiments

  22. Measuring Learnability Expert Proficiency may be Further Increased by Accelerators Temporary decrease due to initial learning of accelerators Learning curve where experts switch from GUI mode to keyboard accelerators Introduction of GUI accelerators Usability Engineering - Basics of Evaluations and Experiments

  23. Three common categories of system with respect to learning curves • Walk-up and Use Systems • First-time (or one-time) users can use the system • effectively without any prior introduction or training • High initial level • Self-explanatory • Relatively little learning to do • Normally implies little functionality Examples of Walk-up and Use Systems • ATMs • Ticket Machines • Public Information Systems • Arcade games Usability Engineering - Basics of Evaluations and Experiments

  24. Three common categories of system with respect to learning curves • 2. Highly Learnable Systems • Some degree of introduction/ training • necessary to start using the system • Very soon users are able to do useful work • Steep(rapid) initial learningcurve • Initial level may be highor low • Examples of Highly Learnable Systems • Word processors / Spreadsheets • High-tech assembly line jobs • Most Computer Games • (no complex simulation games) Usability Engineering - Basics of Evaluations and Experiments

  25. Three common categories of system with respect to learning curves • 3. Complex Systems • Large amount of introduction/ training necessary. • Gentle (slow) initial learning curve • Acceptable only if there is no other way to do the job Examples of Complex Systems • AutoCAD type of software • Plane cockpits • Nuclear Reactor control rooms Usability Engineering - Basics of Evaluations and Experiments

  26. Efficiency • How fast a user accomplish tasks once he/she has learned to use a system • Measured as the steady-state level of • proficiency when the learning curve • levels out (i.e. for ‘Expert Users’) • The steady-state level may not be optimal: • If the user were to bother to learn a few additional features, he or she may save far more time than the time spent learning • A well designed system should lead the user to learn these additional features (e.g. by customized daily tips) Usability Engineering - Basics of Evaluations and Experiments

  27. How do you choose a set of ‘Expert users’? • Let users tell you that they are experts Cheapest, and usually good enough • Define a user as expert if his/her experience rises above a pre-defined minimum amount of use (hours, years, etc.) More reliable, but not always easy to get an accurate measure of that minimum amount of use • How to measure efficiency when you can't get expert users • Measure people several times as they gain proficiency. Plot their learning curve and wait until it levels out Can be expensive, not fully reliable, but better than nothing • Keep in mind: In systems such as games or entertainment, efficiency might be less important than user satisfaction Usability Engineering - Basics of Evaluations and Experiments

  28. Most common causes of Efficiency problems • - Slow response time • any action lasting more than 4-5 seconds must be monitored with a progress bar (or at least a brief feedback message) • - Too much text to read or enter • e.g.command line applications • - Poorly designed user interfaces • too manymenu items, dialog boxesetc. • unnecessary distractingelements(music, colors..) • Cognitive load. (User has to think too much) • - Poorly designed system architecture • lack of an easy step-by-step route to perform the task • “ Dumb” errors that could have being avoided in a better design Usability Engineering - Basics of Evaluations and Experiments

  29. Memorability • How easy is to reuse a system after a substantial time-lapse between visits. • Best measured as proficiency after a period of non-use • Minutes for details like the meaning of icons • Hours for a small but complex function • Days or weeks for a full system Two ways to measure • Perform a standard user test with casual users who have been away from the system for a specified amount of time, and measure the time they need to perform some typical test tasks • Conduct a memory test with users after they finish a test session with the system and ask them to explain the effect of various commands or name the command (or draw an icon) that does a certainthing Hard to measure through typical user-research methods Can often be determined through the use of website analytics Usability Engineering - Basics of Evaluations and Experiments

  30. Measuring Error Handling • Error: Any deviation from the optimalor intendedpath for performing a task (Must excludesituations when user goes off the path just for funor curiosity) • Errors happen and unintended actions are inevitable. • Two types of errors according to origin: • User accidents[Slips] (typographical errors etc.) • The system cannot be blamed for most of these, but it should help the user recover • Errors caused by confusion [Mistakes] • The system should be designed to prevent these Usability Engineering - Basics of Evaluations and Experiments

  31. Slips / Mistakes • Slips: unintended actions a user makes while trying to do something on an interface even though the goal is correct (e.g. typos) • Mistyping a password or an email address • Accidentally clicking an adjacent link • Picking the wrong month when making a reservation • Accidentally double clicking on a button (same action performed twice) • Causes • Working memory failure • Do something familiar but miss a few steps • Unintentional • Users realize right away • More common than mistakes Usability Engineering - Basics of Evaluations and Experiments

  32. Slips / Mistakes • Mistakes: Intended actionstowards a wrong goalbecause users have an incorrect mental model of the action. • Deleting files by removing shortcuts from the desktop • Clicking on a heading that isn't clickable • Typing both first and last name in the first name field • Causes • Do the wrong thing for the goal • Apply a rule in a wrong situation • Make a bad decision • Users may not be aware of the mistake • It can be a learnability or memory issue Usability Engineering - Basics of Evaluations and Experiments

  33. Measuring Error Proneness • Proneness: Having a tendency; inclined; being disposed to do something • Measured in different ways • Number of errors per unit time (in different categories) • Total amount of time spent dealing with errors vs. total time • Total time spent recovering from errors after detection vs. total error time (or total time) Requires sessionrecording so that subtle (minor) errors are not missed (Remember to excludeuser diversion errors made just for funor curiosity) Usability Engineering - Basics of Evaluations and Experiments

  34. Errors as an Opportunity to Educate Users • Nielsen's First Law of Computer Documentation:people don't read it. • Second Law: users read system documentation only when they are in trouble • Use error messages as an educational resource • to impart a small amount of knowledge to users. • Teach users a bit about how the system works and give them information they need to use it better • Connect a concise error messageto a page with additional background material or an explanationof the problem. (Don't overdo this, though.) Users are particularly attentive when they want to recover from an error Usability Engineering - Basics of Evaluations and Experiments

  35. Error Message Guidelines • According to Jacob Nielsen good error messages should: • Clearly indicate that something has gone wrong • Be in a human-readable languageavoid texts like “an error of type 2 has occurred." • Be polite and not blame the users • Describe the problemavoid vague generalities such as "syntax error." • Give constructive advice on how to fix the problem • Be visible and highly noticeable, both in terms of the message and how it indicates wherethingswentwrong • If possible, guess the correct action and let users pick it form a list of fixes • Educate users by giving them information about how the system works • Preserve as much of the user's work as possible so that they don't have to do everything over again Usability Engineering - Basics of Evaluations and Experiments

  36. Measuring User Satisfaction • Objective Satisfaction is difficulttomeasureand assess • Some attemptshave been made to measure satisfactionobjectively: • Measuring stressand comfortlevelsusing - EEG's - Pupil dilation - Heart Rate - Skin Conductivity - Blood Pressure - Adrenaline level • These techniques are not normally suitable for every-day work • Hence: Only Subjective satisfaction is practical to be measured hoping that if enough users are asked, much subjectivity is removed Usability Engineering - Basics of Evaluations and Experiments

  37. Likert Scale • A psychometric scale commonly used in researchthat employs questionnaires • measuring either positive or negative response to a statement (bipolar). • Even-point Likert scales miss the middle option of “Neutral” (neither agree nor disagree) forcing user to cast an opinion (Forced Choice Variation) • The neutral option can be seen as an easy option to take when a respondent is unsure, therefore it is questionable whether it is a true neutral option • Std 5-point Likert scale • 1 = Strongly disagree • 2 = Disagree • 3 = Neutral • 4 = Agree • 5 = Strongly agree • Std 7-point Likert scale • 1 = Strongly disagree • 2 = Disagree • 3 = Rather disagree • 4 = Neutral • 5 = Rather agree • 6 = Agree • 7 = Strongly agree Neutral option exists only in odd-point scales Usability Engineering - Basics of Evaluations and Experiments

  38. Measuring User Satisfaction • Measured through standardized or custom-made satisfactionquestionnaires • administered after each task and/or after the usability test session. • Post-Task questionnaires • Provide insightas seen from the participants’ perspective. • Given immediately after task completion • Regardlessof goal achievement or not • Very few questions (1-3) • Likert scale (original or variations) • Most popular standard post-task questionnaires • SEQ: Single Ease Question  (1 question) • ASQ: After Scenario Questionnaire (3 questions) • SMEQ: Subjective Mental Effort Questionnaire(1 question) • UME: Usability Magnitude Estimation  (1 question) Usability Engineering - Basics of Evaluations and Experiments

  39. ASQ: After Scenario Questionnaire • Overall, I am satisfied with the ease of completing the tasks in this scenario • Overall, I am satisfied with the amount of time it took to complete the tasks in this scenario • Overall, I am satisfied with the support information (online-line help, messages, documentation) when completing the tasks • Some versions may include comments in each question Usability Engineering - Basics of Evaluations and Experiments

  40. SMEQ: Subjective Mental Effort Questionnaire • Also referred to as the Rating Scale for Mental Effort • A 150-point online slider scale with anchors at various points • 0: “Not at all hard to do” • 113: “Tremendously hard to do” • Larger scale allow users to express • themselves more accurately • Produces better results when given • to large numbers of users Usability Engineering - Basics of Evaluations and Experiments

  41. UME: Usability Magnitude Estimation • Created to overcomesome of the disadvantagesof Likertscales. Their closed-ended nature may restrict the range of ratings available • to respondents (so called ceiling and flooreffects) • Users create their own scale. (0 to [no limit] ) • Judgment (rating) is supposed to be on the basis of ratios • If Task 1 is given a rating of 10 • and Task 2 is judged twice as difficult, • then Task2 should be given a rating of 20. • Resulting ratings are converted into a ratio scale of subjective dimension. • Conversion of the raw ratings is achieved with a mathematical formula • makes UME more burdensome for some researchers than the Likert format. Usability Engineering - Basics of Evaluations and Experiments

  42. Session Level Satisfaction Questionnaires • Measure users’ impression of the overall ease of use of the system • Standard or custom-made • Given at the end of each test session • Substantial number of questions (10-50) • Likert scale (original or variations) • Most popular standard post-session questionnaires • SUS: System Usability Scale (10 questions) • SUPR-Q: Standardized UX Percentile Rank Questionnaire (13 questions) • CSUQ: Computer System Usability Questionnaire (19 questions) • QUIS: Questionnaire For User Interaction Satisfaction (24 questions) • SUMI: Software Usability Measurement Inventory (50 questions) Usability Engineering - Basics of Evaluations and Experiments

  43. SUS: System Usability Scale • Simple, 10-item attitude Likert scale giving a global view of subjective assessments of usability • I think that I would like to use this system frequently • I found the system unnecessarily complex • I thought the system was easy to use • I think that I would need the support of a technical person to be able to use this system • I found the various functions in this system were well integrated • I thought there was too much inconsistency in this system • I would imagine that most people would learn to use this system very quickly • I found the system very cumbersome to use • I felt very confident using the system • I needed to learn a lot of things before I could get going with this system Usability Engineering - Basics of Evaluations and Experiments

  44. CSUQ: Computer System Usability Questionnaire • Consists of 19 questions • Answers given in a 7-point Likert scale of “Strongly Disagree” to “Strongly Agree” • Overall, I am satisfied with how easy it is to use this system • It was simpleto use this system • I can effectively complete my work using this system • I am able to complete my work quickly using this system • I am able to efficiently complete my work using this system • I feel comfortable using this system • It was easy to learn to use this system • I believe I became productive quickly using this system • The system gives error messages that clearly tell me how to fix problems • (continued in next slide) Usability Engineering - Basics of Evaluations and Experiments

  45. CSUQ: Computer System Usability Questionnaire • (continued) • Whenever I make a mistake using the system, I recover easily and quick • The information(such as online help, on-screen messages, and other documentation) provided with this system is clear • It is easy to find the information I needed • The information provided for the system is easy to understand • The informationis effectivein helping me completethe tasksand scenarios • The organization of informationon the system screens is clear • The interface of this system is pleasant • I like using the interface of this system • This system has all the functions and capabilities I expect it to have • Overall, I am satisfied with this system Usability Engineering - Basics of Evaluations and Experiments

  46. SUMI: Software Usability Measurement Inventory  • 50 questions / 3-point Likert Scale “Agree” “Undecided” “Disagree” Usability Engineering - Basics of Evaluations and Experiments

  47. Measuring Subjective Satisfaction • Use any standard questionnaire or prepare your own • Use a Likert scale for your answers • - use even-point scales to eliminate neutral options • Don't put too many questions on a questionnaire!! - estimate about 10 minutes to complete a set of 10-15 questions • Consider using a standard questionnaire - use it repeatedly for different studies to have an constant frame of reference • Ask users after they have done realand variedwork - experiments on small tasks can give inaccurate results - users may blame the specific task for their lack of satisfaction • Remember: if enough users are asked, much subjectivity is removed Usability Engineering - Basics of Evaluations and Experiments

  48. If you choose to prepare your own questionnaire… • Each question must be a strong statement (either positive or negative) • May use other semantic differential scales beyond Likert scale • e.g. “Learning the system was” - Very easy - Easy - Neutral - Difficult - Very difficult • For some questions set good to be lower, while for others goodto be higher. Prevents users blindly checking the same column • If you have many participants, have differentversions of the questionnaire Change order of questions Invert good / bad • Be very careful to ensure that there is only one way to interpret the question Always seek reviewsfrom others and run pilotstudies • Always include at least one open-ended question (how, what, when, why..) • Allows user to provide insight for things that you haven’t consider in the first place Usability Engineering - Basics of Evaluations and Experiments

  49. Other thoughts about measuring subjective satisfaction • Subjective satisfaction scores are most useful when you compare several different systems or versions • Users may just remember their worst experiences • If they were happy for 2 hours, but had 5 minutes of frustration when the network went down, they may give a low ranking • Users may also primarily rememberthe last step • Especially if it was notable in some way • If you make up your own questions, be very careful to ensure that there is only one way to interpret the question • Always seek reviews from others and run pilot studies • Always include at least one open-ended question (how, what, when, why..) Usability Engineering - Basics of Evaluations and Experiments

  50. Summarizing: Ways to Measure Usability • Learnability: • Pick novice users of system, measure time to perform certain tasks. Distinguish between no / some general computer experience. • Efficiency: • Decide definition of expertise, get sample expert users (difficult), measure time to perform typical tasks. • Memorability: • Get sample casual users (away from system for certain time), measure time to perform typical tasks. • Errors: • Countslipsand mistakes made by users while performing some specified task. • Satisfaction: • Ask users' subjective opinion (questionnaire, interview), after trying system for a set of real tasks Usability Engineering - Basics of Evaluations and Experiments

More Related