550 likes | 661 Views
The Effects of Interface Design on Telephone Dialing Performance. Master’s thesis in Computer Science Andrew R. Freed 4/30/2003. The Effects of Interface Design on Telephone Dialing Performance. Towards automatic interface evaluation Methods of evaluation Experiment design Three analyses
E N D
The Effects of Interface Design on Telephone Dialing Performance Master’s thesis in Computer Science Andrew R. Freed 4/30/2003
The Effects of Interface Design on Telephone Dialing Performance • Towards automatic interface evaluation • Methods of evaluation • Experiment design • Three analyses • Comparison of analyses • Further work
Towards automatic interface evaluation • Why not test with actual users instead? • It takes too much time and money! • Automatic evaluation has been useful in the past (Project Ernestine - Gray et al 1992) to the tune of $2.4M savings/year • Several proposed tools will make this type of evaluation easier
Towards automatic interface evaluation • Motivation: • Eye-tracking studies by Byrne (1999, 2001) and Hornof (1997) • Cognitive models as surrogate users (Ritter 2001)
Towards automatic interface evaluation • 100 phones to choose from • Selected 10 for analysis
Towards automatic interface evaluation • 10 tasks (Ritter 2000) • 1. Call home (*) • 2. Call work (*) • 3. Redial last number (*) • 4. Call directory inquiries • 5. Call mother (*) • 6. Conference call work and home (*) • 7. Conference call work (flash) then home • 8. Forward call to another number (*) • 9. Forward call (flash) to another number • 10. Hang up
Towards automatic interface evaluation • 10 telephone numbers • 814-866-5000 215-654-5785 • 123-654-7890 814-234-9657 • 814-863-5000 740-611-9273 • 412-268-3000 101-010-1010 • 606-193-3012 103-273-1029 • and 3 other tasks • Forward, redial, conference call
Methods of evaluation • Possible tools • Cognitive architectures • ACT-R/PM • Generic Simulated Eyes and Hands • Focused analysis methods
Possible tools • Ivory’s tools to evaluate websites (2001) • Apex (M. Freed 1998) and iGen (Emmerson 2000) model complex tasks • Glean (Kieras et al 1995) evaluates Lisp interfaces • Shortcomings: no learning, no visual search, tied to a specific interface format, no cognitive theory
Cognitive architectures • Unified theory of cognition (Newell 1990) • Simulate human behavior • Perceptual and motor capability (simulated eyes and hands) • Can do visual search, click buttons, sometimes learn
Cognitive architectures (examples) • EPIC (Kieras and Meyer 1997) - has visual search and perceptual/motor skills… but only evaluates Common Lisp interfaces • Soar (Newell 1990) - also has visual search, perceptual motor skills, plus learning… but only evaluates Tcl/Tk interfaces (or requires a socket connection) • ACT-R/PM (Anderson & Lebiere 1998, Byrne 2001) - nearly identical benefits and limitations as EPIC, plus has learning
ACT-R/PM • Why did we choose ACT-R/PM? • Well-accepted cognitive architecture • Used in past to evaluate interfaces • Can overcome the “Lisp interface-only” problem with generic eyes and hands
Generic Simulated Eyes and Hands • Segman (St. Amant & Riedl 2001) can parse a Windows screen capture and determine the interface components • Can use interfaces written in Lisp, Tcl/Tk, HTML, Visual C++, ... • Segman can be connected to ACT-R/PM
Focus of analysis • A - Analytical model (Fitts’ Law) • B - Cognitive model (ACT-R/PM) • C - Human data
General experiment design • Analytical model, cognitive model, and human users interact with same interfaces • Analytical model dials each number once on each phone, does not do other tasks • Cognitive model: Dialed each phone number 50 times on each phone, performed other phone tasks 50 times on each phone. • Human users (N=9): Dialed each phone number on each phone, performed other phone tasks once on each phone
General experiment design • Experimental software
General experiment design • Cognitive model and users • Timing and mouse-click logging • Eye-tracking • Users can control pace of trials, model does not “care” • Analytical model • Does not need to “see” telephones • Mathematical formula with pixel-level input yields “reaction times”
A. Fitts’ Law analysis • What is Fitts’ Law? • Numerical analysis • Simple conclusions and problems
What is Fitts’ Law? • Fitts’ Law (two possible forms): • MT = a + b * LOG2(2 * D/W) (Fitts 1954) • MT = max(tm, k * LOG2[0.5 + D/W]) (Card et al, 1983) • MT is mouse movement time • D is distance to target, W is target width • a, b, k are constants • tm is minimum movement time
Numerical analysis • Collected pixel-level input about telephones (size and location of buttons) • Dialing a phone requires 10 movements • Total the times from the 10 movements and a base dialing time is established (with no visual search!)
Numerical analysis • Validating our choice of sample telephone numbers (R2 = 0.96)
Simple conclusions and problems • Fitts’ Law analysis is fast (it is just an equation!) • Does not consider many factors • Not affected by any aspect of interface design other than button sizing and spacing
B. ACT-R/PM model analysis • Description of model • Visual search predictions • ACT-R/PM makes different reaction time conclusions
Description of ACT-R/PM model • Model has three main components that can operate in parallel: • retrieve a phone digit from memory • visually search for the digit • move the mouse/click on a digit (governed by Fitts’ Law) • Composed of 71 production rules (mostly for visual search)
Description of ACT-R/PM model • Visual search strategy: random or systematic • One production for random search • Find-random-target IF the goal is to find a phone target THEN find a visual object of type textwhich has not been attended lately
Description of ACT-R/PM model • Sixty productions for systematic search • Systematic-search-from-target IF a digit x is in the visual buffer AND the goal is to find a target y AND y is in direction z from x THEN find a visual object of type text in direction z from target x which is within the bounds of the keypad
Visual search predictions • Count fixations and note fixation locations • Search for the keypad is random • Search within the keypad is systematic • The telephones do not generally require a statistically significant different number of fixations to dial (about 16) • (The telephone numbers are significantly different)
Visual search predictions • Model trace
Visual search predictions Phone 4 Phone 9 What’s wrong with this picture?
Visual search predictions • Two phones are predicted to have abnormally long visual searches • These phones require approximately sixty fixations (average on others was sixteen) • Phone 4 has an upside-down keypad -- the systematic search fails! • Phone 9 contains extra information on the buttons… distracts the visual search • We will see the model takes much longer than humans to dial these phones
ACT-R/PM makes different reaction time conclusions • This is no surprise - more factors are being considered • Phones 4 and 9 pay a large visual search penalty • Fitts’ Law still a factor - phones with “Fitts’ Law violations” still perform worse
ACT-R/PM makes different reaction time conclusions • The phones are often shown to have different dialing times (T-test, p<.05) • The significance level of the differences depends on the telephone number being dialed • On average, approximately 8.7 seconds to dial a telephone. • Never faster than six seconds • No errors!
ACT-R/PM makes different reaction time conclusions • Model is able to perform additional tasks (redial, forward, conference) with a random search • Model does not always succeed but never gives up • Will attend the same visual target several times
C. User data analysis • Where and how users look (eye-tracking) • Humans make errors • Summary of user reaction times
Where and how users look • Fast random search for keypad • Systematic search within keypad
Where and how users look • User trace
Where and how users look • Users require approximately the same number of fixations per telephone as the model did (also true for telephone numbers) • User able to cope with phones 4 and 9 by changing search strategy • Phone 4: “Up is down, down is up” • Phone 9: Ignore ABCs on the keypad
Where and how users look • Fixation comparison across numbers (R2 = 0.11)
Where and how users look • Fixation comparison across 8 phones (R2 = 0.34)
Humans make errors • Errors not predicted by the automatic analyses • Depend on several factors • Number being dialed • Dialing speed (weak correlation) • Interface being used
Errors dependent on interface • Most errors on “Fitts’ Law violators” • Least errors when large and adjacent buttons • Users will move mouse while clicking (ACT-R/PM will not), this can cause errors • Possible to estimate number of errors with Fitts’ “index of difficulty”?
Summary of reaction times • User on average more than one second faster than model • This probably due to efficient pipelining of motor tasks (room for ACT-R/PM improvement) • Users can dial as fast as 3.5 seconds (average is seven seconds)
Summary of reaction times • Model (R2 = 0.41), Fitts’ (R2 = 0.85), user dial time across phones
Summary of reaction times • Users can do other phone tasks faster than ACT-R/PM • Users can find the target under varied conditions • Users try more strategies to find target • Users will give up if they can’t succeed!
Summary of reaction times • Model vs user on extra tasks (R2 = 0.60, 0.26, 0.11)
Summary of reaction times • User data also shows that the interfaces are often significantly different (p <.05), though less often than the model says • User time differences also depend on the number being dialed • Theory: users less affected by additional interface objects than ACT-R/PM
Comparison of analyses • Analytical model is not enough • Visual search differences between ACT-R/PM and users • ACT-R/PM and Segman need better representation of interfaces • Cognitive models can make more complicated predictions • ACT-R/PM model is generally slower than users
Further work • Cellular phones • This analysis does not work “out of the box” for cellular phones • These phones have different tasks! (Golightly 2003) • Hutchinson 3G UK phone task (Golightly 2003) • Analysis of menu controls for cellular phone menus, included analytical model • Interface became easier to use when more directional controls were provided
Further work • Analyzing ten additional designs • Easy if you use existing automatic models! • Fifteen minutes for Fitts’ Law analysis • Forty-five minutes for 500 model runs • Hard if you test with actual users! • Can take weeks to get scheduled • Humans miss appointments