1 / 28

Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

Tool for Accurately Predicting Website Navigation Problems, Non-Problems, Problem Severity, and Effectiveness of Repairs. Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado. Part One. Work supported by NSF Grant 01-37759 to M. H. Blackmon

aletha
Download Presentation

Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tool for Accurately Predicting Website Navigation Problems, Non-Problems, Problem Severity, and Effectiveness of Repairs Marilyn Hughes Blackmon, U. of Colorado Muneo Kitajima, AIST, Japan Peter Polson, U. of Colorado

  2. Part One Work supported by NSF Grant 01-37759 to M. H. Blackmon http://autocww.colorado.edu/~brownr/ACWW.php http://autocww.colorado.edu/~blackmon http://autocww.colorado.edu

  3. Problem that spurred research and development of tool • Focus on users building comprehensive knowledge of a topic • Browse complex websites (cf. search engine) • Pure forward search • Learn by exploration • Automatically predict what is worth repairing? • Need accurate measure of problem severity • Need to predict success rate for repairs • Web designers using tool must be able to do what unaided designers cannot: predict behavior of users different from themselves – objectively represent user diversity (background knowledge)

  4. Solution: Incrementally extend Cognitive Walkthrough for the Web (CWW) • CHI2002 paper tailored Cognitive Walkthrough (CW) for web navigation • Proved CWW would identify usability problems that interfere with web navigation • Substituted objective measures of similarity, familiarity, and elaboration of heading/link texts using Latent Semantic Analysis (LSA) • CHI2003 paper proved significantly better performance on CWW-repaired webpages vs. original, unrepaired pages

  5. Percent task failure correlated 0.93 with observed clicks (each task n≥38)

  6. Research problem, reformulated: What determines mean clicks? • Identify & repair factors that increase mean clicks and raise risk of task failure • Hypothetical determinants, based on prior results and theory underlying CWW research: • Unfamiliar correct link, i.e., insufficient background knowledge to comprehend link • Competing headings & their high-scent links • Competing links under correct heading • Weak scent correct link under correct heading

  7. First step: Collect enough data for multiple regression analysis • Reused 64 tasks from CHI2003 paper and ran additional experiments to get data on 100 new tasks, creating 164-task dataset • Developed automatable rules for CWW problem identification • Built multiple regression model for 164-task dataset and found 3 independent variables explaining 57% of the variance

  8. Multiple regression translates into formula to predict problem severity • Multiple regression analysis yielded formula for predicting mean clicks on links: • + 2.199 (predicted clicks for non-problem) • + 1.656 if correct link is unfamiliar • + 0.754 times number of competing links nested under any competing heading • + 1.464 if correct link has weak-scent • + zero clicks for competing links under correct heading • Prediction for non-problem task = 2.199 • ≥2.5 mean clicks distinguishes problem from non-problem

  9. Example of task: Find article about Hmong: List of 9 categories > Social Science > Anthropology Scroll A-Z list to find Hmong

  10. CWW-identified problems in “Find Hmong” task: Competing headings 0.19 0.30 0.08

  11. Predicted mean clicks for Find Hmong task on original, unrepaired webpage • + 2.199 -- predicted clicks for non-problem • + 1.656 -- if correct link is unfamiliar • + 1.464 -- if correct link has weak-scent • + 3.770 -- (0.754 *5, the number of competing links nested under any competing heading)_________ • 9.089 -- predicted mean total clicks

  12. CWW-guided repairs of navigation usability problems detected by CWW • Create alternate high-scent paths to target webpage via all correct and competing headings • IF competing heading(s) • IF unfamiliar correct link • IF weak-scent correct link • Substitute or elaborate link text with familiar, higher frequency words • IF unfamiliar correct link

  13. Repair benefits for “Find Hmong,” a problem definitely worth repairing

  14. All 164 tasks: Predicted vs. observed mean total clicks

  15. Psychological validity measures for 164-task dataset • For 46 tasks predicted to have serious problems (i.e., predicted clicks ≥ 5.0) • 100% hit rate, 0% false alarms • 93% success rate for repairs (statistically significant difference repaired vs. not) • For all 75 tasks predicted to be problems • 92% hit rate, 8% false alarms • 83% success rate for repairs, significant different repaired vs. unrepaired, p<.0001

  16. Cross-validation study: Replicate the model on new dataset? • Ran another large experiment to test whether multiple regression formula replicated with new set of tasks • 2 groups • Each group did 32 new tasks, 64 total tasks • Used prediction formula to identify problems vs. non-problems • All tasks have just one correct link

  17. Multiple regression analysis produced full cross validation • Multiple regression of 64-task dataset gave same 3 determinants found for 164-task original dataset & similar coefficients • Hit rate for predicted problems = 90%, false alarms = 10% • Correct rejection for predicted non-problems = 69%, 31% misses, but 2/3 of misses had observed clicks 2.5-3.5, other 1/3 of misses >3.5 but <5.0

  18. Predicted vs. observed clicks for 64 tasks in cross-validation experiment

  19. Part Two

  20. Theory matters: CWW is theory-based usability evaluation method • CoLiDeS cognitive model (Kitajima, Blackmon, & Polson, 2000, 2005) • Construction-Integration cognitive architecture (Kintsch, 1998), a comprehensive model of human cognitive processes • Latent Semantic Analysis (LSA)

  21. The Key Idea • Core process underlying Web navigation is skilled reading comprehension • Comprehension processes build mental representations of goals and webpage objects (subregions, hyperlinks, images, and other targets for action) • Action planning compares goal with potential targets for action and selects target with highest activation level

  22. Consensus: Web navigation is equivalent to following scent trail • Scent or residue (Furnas, 1997) • SNIF-ACT based on Information Foraging (Pirolli & Card, 1999) • Bloodhound Project: Web User Flow by Information Scent (WUFIS) => InfoScent Simulator (Chi, et al., 2001, 2003) • CWW activation level

  23. CoLiDeS activation level: Scent is MORE than just similarity • Adequate background knowledge to comprehend headings and links? Select semantic space that best matches user group • Warning bell for low word frequency • Warning bell for low term vector • Before computing similarity, simulate human elaboration of link texts during comprehension, using LSA Near neighbors, finding terms simultaneously familiar and similar in meaning • Compute goal-heading and goal-link similarity with LSA cosines, defining weak scent as a cosine <0.10, moderate scent as cosine ≥0.30

  24. Conclusions: Extending CWW successful for research and development of tool • We CAN now predict severity of navigation usability problems and success rate for repairs of these problems, so we invest time to repair only what is worth repairing:tasks predicted ≥5.0 clicks • Web designers using tool CAN do what unaided designers cannot: predict behavior of users different from themselves – objectively represent user diversity in education level, culture, language, and field of expertise (background knowledge)

  25. Conclusions, continued • Scales up to large websites • Reliable (LSA measures vs. human judgments) • Psychologically valid (228-task dataset, large n gives stable mean for each task), based on cognitive model • Theory matters • Drives experimental design • High accuracy and psychological validity of tool • Practitioners and researchers can now put the tool to use with trust

  26. Non-problem task Find Fern approaches asymptote of pure forward search • One-click minimum path for both problems AND non-problems • 1.1 mean total clicks on links • 90% pure forward search (minimum path solution) • 97% of first clicks were on link under correct heading • 100% success rate -- everyone finished task in 1 or 2 clicks • 9 seconds = mean solution time

More Related