Evaluation

Evaluation How (un)usable is your software?

Agenda • Finish slides from last week • Multimodal UIs: Ted • Intro to evaluation • Experiments

Pen & Mobile dialog • Stylus or finger • Tradeoffs of each? • Pen as a standard mouse (doubleclick?) • Variety of platforms • Desktop touch screens or input pads (Wacom) • Tablet PCs • Handheld and Mobile devices • Electronic whiteboards • Platforms often involve variety of size and other constraints

Mobile devices • More common as more platforms available • PDA • Cell phone • Ultra mobile tablets • Smaller display (160x160), (320x240) • Few buttons, different interactions • Free-form ink • Soft keyboard • Numeric keyboard => text • Stroke recognition • Hand printing / writing recognition

http://www.intel.com/design/mobile/platform/umpc.htm Palm Z22 handheld http://www.palm.com Ultra-Mobile PC (Samsung) http://www.oqo.com/ http://www.blackberry.com/

Soft Keyboards • Common on PDAs and mobile devices • Tap on buttons on screen

Soft Keyboard • Presents a small diagram of keyboard • You click on buttons/keys with pen • QWERTY vs. alphabetical • Tradeoffs? • Alternatives?

Numeric Keypad -T9 • Tegic Communications developed • You press out letters of your word, it matches the most likely word, then gives optional choices • Faster than multiple presses per key • Used in mobile phones • http://www.t9.com/

Cirrin - Stroke Recognition • Developed by Jen Mankoff (GT -> Berkeley CS Faculty -> CMU CS Faculty) • Word-level unistroke technique • UIST ‘98 paper • Use stylus to go from one letterto the next ->

Quikwriting - Stroke Recogntion • Developed by Ken Perlin

Quikwriting Example p l e Said to be as fast as graffiti, but have to learn more http://mrl.nyu.edu/~perlin/demos/Quikwrite2_0.html

Hand Printing / Writing Recognition • Recognizing letters and numbers and special symbols • Lots of systems (commercial too) • English, kanji, etc. • Not perfect, but people aren’t either! • People - 96% handprinted single characters • Computer - >97% is really good • OCR (Optical Character Recognition)

Recognition Issues • Boxed vs. Free-Form input • Sometimes encounter boxes on forms • Printed vs. Cursive • Cursive is much more difficult • Letters vs. Words • Cursive is easier to do in words vs individual letters, as words create more context • Usually requires existence of a dictionary • Real-time vs. off-line

Special Alphabets • Graffiti - Unistroke alphabet on Palm PDA • What are yourexperienceswith Graffiti? • Other alphabets or purposes • Gestures for commands

Pen Gesture Commands • Might mean delete • Insert • Paragraph Define a series of (hopefully) simple drawing gesturesthat mean different commands in a system

Pen Use Modes • Often, want a mix of free-form drawing and special commands • How does user switch modes? • Mode icon on screen • Button on pen • Button on device

Error Correction • Having to correct errors can slow input tremendously • Strategies • Erase and try again (repetition) • When uncertain, system shows list of best guesses (n-best list) • Others??

Free-form Ink • Ink is the data, take as is • Human is responsible forunderstanding andinterpretation • Often time-stamped • Applications • Signature verification • Notetaking • Electronic whiteboards • Sketching

Electronic whiteboards • Smartboard and Mimio • Can integrate with projection • Large surface to interact with • Issues? http://www.mimio.com/ http://www.smarttech.com/

Real paper • Anoto digital paper and pen technology (http://www.anoto.com/) • Issues? Logitech io Digital Writing System http://www.logitech.com/

General Issues – Pen input • Who is in control - user or computer • Initial training required • Learning time to become proficient • Speed of use • Generality/flexibility/power • Special skills - typing • Gulf of evaluation / gulf of execution • Screen space required • Computational resources required

Other interesting interactions • Gesture input • Specialized hardware, or tracking • 3D interaction • Stereoscopic displays • Virtual reality • Immersive displays such as glasses, caves • Augmented reality • Head trackers and vision based tracking

What’s coming up • Upcoming related topics • Multimodal UIs: Ted • 3D user interfaces: Amy • Conversational agents: Evan

Summative assess an existing system judge if it meets some criteria Formative assess a system being designed gather input to inform design Summative or formative? Depends on maturity of system how evaluation results will be used Same technique can be used for either When to do evaluation?

Form of results of obtained Quantitative Qualitative Who is experimenting with the design End users HCI experts Approach Experimental Naturalistic Predictive Other distinctions

Evaluation techniques • Predictive Evaluation • Fitt’s law, Hick’s, etc. • Observation • Think-aloud • Cooperative evaluation • Watch users perform tasks with your interface • Next lecture

More techniques • Empirical user studies (experiments) • Test hypotheses about your interface • Examine dependent variables against independent variables • More later… • Interviews • Questionnaire • Focus Groups • Get user feedback • More next week…

Still more techniques • Discount usability techniques • Use HCI experts instead of users • Fast and cheap method to get broad feedback • Heuristic evaluation • Several experts examine interface using guiding heuristics (like the ones we used in design) • Cognitive Walkthrough • Several experts assess learnability of interface for novices • In class – two weeks from today

And still more techniques • Diary studies • Users relate experiences on a regular basis • Can write down, call in, etc. • Experience Sampling Technique • Interrupt users with very short questionnaire on a random-ish basis • Good to get idea of regular and long term use in the field (real world)

General Recommendations • Identify evaluation goals • Include both objective & subjective data • e.g. “completion time” and “preference” • Use multiple measures, within a type • e.g. “reaction time” and “accuracy” • Use quantitative measures where possible • e.g. preference score (on a scale of 1-7) Note: Only gather the data required; do so with minimum interruption, hassle, time, etc.

Evaluation planning • Decide on techniques, tasks, materials • What are usability criteria? • How much required authenticity? • How many people, how long • How to record data, how to analyze data • Prepare materials – interfaces, storyboards, questionnaires, etc. • Pilot the entire evaluation • Test all materials, tasks, questionnaires, etc. • Find and fix the problems with wording, assumptions • Get good feel for length of study

Performing the Study • Be well prepared so participant’s time is not wasted • Explain procedures without compromising results • Session should not be too long , subject can quit anytime • Never express displeasure or anger • Data to be stored anonymously, securely, and/or destroyed • Expect anything and everything to go wrong!! (a little story)

Consent • Why important? • People can be sensitive about this process and issues • Errors will likely be made, participant may feel inadequate • May be mentally or physically strenuous • What are the potential risks (there are always risks)?

Data Inspection • Start just looking at the data • Were there outliers, people who fell asleep, anyone who tried to mess up the study, etc.? • Identify issues: • Overall, how did people do? • “5 W’s” (Where, what, why, when, and for whom were the problems?) • Compile aggregate results and descriptive statistics

Making Conclusions • Where did you meet your criteria? Where didn’t you? • What were the problems? How serious are these problems? • What design changes should be made? • But don’t make things worse… • Prioritize and plan changes to the design

Example: Heather’s evaluation • Evaluate use of an interface in a realistic task • Interface: video + annotated transcript • Video was of a requirements gathering session • Task was to create a requirements document based on the video H. Richter et al. "An Empirical Investigation of Capture and Access for Software Requirements Activities," in Graphics Interface 2005.

The Interface: TagViewer

The Setup • Subjects: 12 CS grad students • Task: • Watch 1 hour video, take personal notes as desired • Return 3-7 days later, asked to create as complete, detailed requirements document as possible in 45 minutes • 2 conditions – just video, or the interface to help • Recording: • Video recorded subject over shoulder • Software logs of both video and interface • Interview afterwards • Kept notes

Some results

Some conclusions • Very different use of the video • Those with the interface found it useful • Those without, didn’t • Used the video to clarify details, look for missing information • Annotations provided efficient ways to find information, supported a variety of personal strategies • Usability pretty good, will need more sophisticated searching for longer videos

Experiments • Design the experiment to collect the data to test the hypotheses to evaluate the interface to refine the design • A controlled way to determine impact of design parameters on user experience • Want results to eliminate possiblity of chance

Experimental Design • Determine tasks • Need clearly stated, benchmark tasks • Determine performance measures • Speed (reaction time, time to complete) • Accuracy (errors, hits/misses) • Production (number of files processed) • Score (number of points earned) • Preference, satisfaction, etc. (i.e. questionnaire response) also valid • Determine variables and hypotheses

Types of Variables • Independent • What you’re studying, what you intentionally vary (e.g., interface feature, interaction device, selection technique) • Dependent • Performance measures you record or examine (e.g., time, number of errors) • Controlled • Factors you want to prevent from influencing results

“Controlling” Variables • Prevent a variable from affecting the results in any systematic way • Methods of controlling for a variable: • Don’t allow it to vary • e.g., all males • Allow it to vary randomly • e.g., randomly assign participants to different groups • Counterbalance - systematically vary it • e.g., equal number of males, females in each group • The appropriate option depends on circumstances

Hypotheses • What you predict will happen • More specifically, the way you predict the dependent variable (i.e., accuracy) will depend on the independent variable(s) • “Null” hypothesis (Ho) • Stating that there will be no effect • e.g., “There will be no difference in performance between the two groups” • Data used to try to disprove this null hypothesis

Example • Do people complete operations faster with a black-and-white display or a color one? • Independent - display type (color or b/w) • Dependent - time to complete task (minutes) • Controlled variables - same number of males and females in each group • Hypothesis: Time to complete the task will be shorter for users with color display • Ho: Timecolor = Timeb/w

Subjects • How many? • Book advice: at least 10 • Other advice: 6 per experimental condition • Real advice: depends on statistics • Relating subjects and experimental conditions • Within/between subjects design

Experimental Designs • Within Subjects Design • Every participant provides a score for all levels or conditions ColorB/W P1 12 secs. 17 secs. P2 19 secs. 15 secs. P3 13 secs. 21 secs. ...

Experimental Designs • Between Subjects • Each participant provides results for only one condition ColorB/W P1 12 secs. P2 17 secs. P3 19 secs. P5 15 secs. P4 13 secs. P6 21 secs. ...

Evaluation

Evaluation

Presentation Transcript

evaluation

Evaluation

Evaluation

Evaluation

EVALUATION

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

EVALUATION

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation Economic Evaluation

Evaluation

Evaluation