280 likes | 420 Views
Tool for Accurately Predicting Website Navigation Problems, Non-Problems, Problem Severity, and Effectiveness of Repairs. Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado. Part One. Work supported by NSF Grant 01-37759 to M. H. Blackmon
E N D
Tool for Accurately Predicting Website Navigation Problems, Non-Problems, Problem Severity, and Effectiveness of Repairs Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado
Part One Work supported by NSF Grant 01-37759 to M. H. Blackmon http://autocww.colorado.edu/~brownr/ACWW.php http://autocww.colorado.edu/~blackmon http://autocww.colorado.edu
Problem that spurred research and development of tool • Focus on users building comprehensive knowledge of a topic • Browse complex websites (cf. search engine) • Pure forward search • Learn by exploration • Automatically predict what is worth repairing? • Need accurate measure of problem severity • Need to predict success rate for repairs • Web designers using tool must be able to do what unaided designers cannot: predict behavior of users different from themselves – objectively represent user diversity (background knowledge)
Solution: Incrementally extend Cognitive Walkthrough for the Web (CWW) • CHI2002 paper tailored Cognitive Walkthrough (CW) for web navigation • Proved CWW would identify usability problems that interfere with web navigation • Substituted objective measures of similarity, familiarity, and elaboration of heading/link texts using Latent Semantic Analysis (LSA) • CHI2003 paper proved significantly better performance on CWW-repaired webpages vs. original, unrepaired pages
Percent task failure correlated 0.93 with observed clicks (each task n≥38)
Research problem, reformulated: What determines mean clicks? • Identify & repair factors that increase mean clicks and raise risk of task failure • Hypothetical determinants, based on prior results and theory underlying CWW research: • Unfamiliar correct link, i.e., insufficient background knowledge to comprehend link • Competing headings & their high-scent links • Competing links under correct heading • Weak scent correct link under correct heading
First step: Collect enough data for multiple regression analysis • Reused 64 tasks from CHI2003 paper and ran additional experiments to get data on 100 new tasks, creating 164-task dataset • Developed automatable rules for CWW problem identification • Built multiple regression model for 164-task dataset and found 3 independent variables explaining 57% of the variance
Multiple regression translates into formula to predict problem severity • Multiple regression analysis yielded formula for predicting mean clicks on links: • + 2.199 (predicted clicks for non-problem) • + 1.656 if correct link is unfamiliar • + 0.754 times number of competing links nested under any competing heading • + 1.464 if correct link has weak-scent • + zero clicks for competing links under correct heading • Prediction for non-problem task = 2.199 • ≥2.5 mean clicks distinguishes problem from non-problem
Example of task: Find article about Hmong: List of 9 categories > Social Science > Anthropology Scroll A-Z list to find Hmong
CWW-identified problems in “Find Hmong” task: Competing headings 0.19 0.30 0.08
Predicted mean clicks for Find Hmong task on original, unrepaired webpage • + 2.199 -- predicted clicks for non-problem • + 1.656 -- if correct link is unfamiliar • + 1.464 -- if correct link has weak-scent • + 3.770 -- (0.754 *5, the number of competing links nested under any competing heading)_________ • 9.089 -- predicted mean total clicks
CWW-guided repairs of navigation usability problems detected by CWW • Create alternate high-scent paths to target webpage via all correct and competing headings • IF competing heading(s) • IF unfamiliar correct link • IF weak-scent correct link • Substitute or elaborate link text with familiar, higher frequency words • IF unfamiliar correct link
Repair benefits for “Find Hmong,” a problem definitely worth repairing
Psychological validity measures for 164-task dataset • For 46 tasks predicted to have serious problems (i.e., predicted clicks ≥ 5.0) • 100% hit rate, 0% false alarms • 93% success rate for repairs (statistically significant difference repaired vs. not) • For all 75 tasks predicted to be problems • 92% hit rate, 8% false alarms • 83% success rate for repairs, significant different repaired vs. unrepaired, p<.0001
Cross-validation study: Replicate the model on new dataset? • Ran another large experiment to test whether multiple regression formula replicated with new set of tasks • 2 groups • Each group did 32 new tasks, 64 total tasks • Used prediction formula to identify problems vs. non-problems • All tasks have just one correct link
Multiple regression analysis produced full cross validation • Multiple regression of 64-task dataset gave same 3 determinants found for 164-task original dataset & similar coefficients • Hit rate for predicted problems = 90%, false alarms = 10% • Correct rejection for predicted non-problems = 69%, 31% misses, but 2/3 of misses had observed clicks 2.5-3.5, other 1/3 of misses >3.5 but <5.0
Predicted vs. observed clicks for 64 tasks in cross-validation experiment
Theory matters: CWW is theory-based usability evaluation method • CoLiDeS cognitive model (Kitajima, Blackmon, & Polson, 2000, 2005) • Construction-Integration cognitive architecture (Kintsch, 1998), a comprehensive model of human cognitive processes • Latent Semantic Analysis (LSA)
The Key Idea • Core process underlying Web navigation is skilled reading comprehension • Comprehension processes build mental representations of goals and webpage objects (subregions, hyperlinks, images, and other targets for action) • Action planning compares goal with potential targets for action and selects target with highest activation level
Consensus: Web navigation is equivalent to following scent trail • Scent or residue (Furnas, 1997) • SNIF-ACT based on Information Foraging (Pirolli & Card, 1999) • Bloodhound Project: Web User Flow by Information Scent (WUFIS) => InfoScent Simulator (Chi, et al., 2001, 2003) • CWW activation level
CoLiDeS activation level: Scent is MORE than just similarity • Adequate background knowledge to comprehend headings and links? Select semantic space that best matches user group • Warning bell for low word frequency • Warning bell for low term vector • Before computing similarity, simulate human elaboration of link texts during comprehension, using LSA Near neighbors, finding terms simultaneously familiar and similar in meaning • Compute goal-heading and goal-link similarity with LSA cosines, defining weak scent as a cosine <0.10, moderate scent as cosine ≥0.30
Conclusions: Extending CWW successful for research and development of tool • We CAN now predict severity of navigation usability problems and success rate for repairs of these problems, so we invest time to repair only what is worth repairing:tasks predicted ≥5.0 clicks • Web designers using tool CAN do what unaided designers cannot: predict behavior of users different from themselves – objectively represent user diversity in education level, culture, language, and field of expertise (background knowledge)
Conclusions, continued • Scales up to large websites • Reliable (LSA measures vs. human judgments) • Psychologically valid (228-task dataset, large n gives stable mean for each task), based on cognitive model • Theory matters • Drives experimental design • High accuracy and psychological validity of tool • Practitioners and researchers can now put the tool to use with trust
Non-problem task Find Fern approaches asymptote of pure forward search • One-click minimum path for both problems AND non-problems • 1.1 mean total clicks on links • 90% pure forward search (minimum path solution) • 97% of first clicks were on link under correct heading • 100% success rate -- everyone finished task in 1 or 2 clicks • 9 seconds = mean solution time