200 likes | 283 Views
Visual Rating System for HFES Graphics: Design and Analysis. Paul Aumer-Ryan School of Information The University of Texas at Austin October 18, 2006 HFES 2006: San Francisco. 1. Introduction. Good graphs, bad graphs Graphs should be easily: Comprehendible Readable
E N D
Visual Rating System for HFES Graphics:Design and Analysis Paul Aumer-Ryan School of Information The University of Texas at Austin October 18, 2006 HFES 2006: San Francisco
1 Introduction • Good graphs, bad graphs • Graphs should be easily: • Comprehendible • Readable • Reproducible (via fax, photocopy, etc.) • Understandable
1 Graph citation available upon request
1 Introduction • Motivations for good graph design: • Enhances an argument • Helps readers understand an argument • Makes a paper look professional and thought-out, with attention to detail
1 Introduction • Four purposes of this study: • To see if a paper published by Gillan, Wickens, Hollands, & Carswell (1998) had any effect on the graphs published after that date • To find if there is some universal “readability” measure for graph design (i.e., everyone finds the same things helpful when reading a graph) • To encourage good graph design • To develop a standardized and streamlined method for evaluating graphs
2 Method • Began with about 90 guidelines for good graph design from Gillan & Wickens (1998) • Progressively narrowed and culled guidelines to 33 statements (5-point Likert scale for each) • Built a “Visual Rating System” • Basically a Web-based survey tool to rate graphs against the 33 guidelines • (screenshot on next slide)
2 Method • Recruited and met 6 research assistants • Undergraduates in the psychology department interested in research • Evaluated several sample graphs as a group (to maintain inter-coder reliability) • Discussed clarity of guidelines and potential issues
2 Method • Selected a sampling of graphs to evaluate • Since the guidelines were published in 1998, we chose graphs published in 1997 and 2001 (before and after) • Focused on bar graphs, line graphs, scatter plots, and pie charts • Chose 50 graphs from 1997 and 50 graphs from 2001 (100 out of a total of about 500) • Each RA evaluated 50 graphs • Each graph was rated by 3 RAs
3 Results • Chose to run t-tests on each of the 33 guidelines (since the independent variable has 2 levels—1997 and 2001) • Because so many analyses were performed (leading to an increased possibility of type I errors--false positives), I decided to restrict the significance value according to the Bonferroni adjustment (.05/N => .05/33 = .002)
3 Results • Found 5 statistically significant questions: • 2 major findings (p under .002), • 3 minor findings (p under .05)
3 Results • Major findings Numbers (quantitative labels) are large enough to read comfortably Words are large and readable Strongly Agree (5) Strongly Agree (5) Agree Agree (4) Agree Agree (4) Neutral Neutral Neutral (3) Neutral (3) enough to read comfortably Words are large and readable Disagree (2) Disagree (2) Mean Numbers (quantitative labels) are large 1997 2001 1997 2001 Year graph was published Year graph was published t(180)=3.656, p<.000 1997: (M=3.72, SD=1.386) 2001: (M=2.94, SD=1.480) t(174)=3.581, p<.000 1997: (M=3.90, SD=1.187) 2001: (M=3.18, SD=1.451)
The graph is visible at small sizes (e.g., 2-inches wide) The axis labels are placed near their axes The styles of lines are varied to make them distinct from one another Strongly Agree (5) 5 5 Agree Agree Agree (4) 4 4 Neutral Neutral Neutral inches wide) Neutral (3) 3 3 distinct from one another Disagree The graph is visible at small sizes (e.g., 2- The axis labels are placed near their axes The styles of lines are varied to make them Disagree (2) 2 2 1997 2001 1997 2001 1997 2001 Year graph was published Year graph was published Year graph was published t(139)=-2.639, p<.009 1997: (M=2.28, SD=1.312) 2001: (M=2.88, SD=1.386) t(180)=-2.776, p<.006 1997: (M=3.95, SD=1.246) 2001: (M=4.37, SD=0.744) t(180)=2.347, p<.020 1997: (M=3.42, SD=1.432) 2001: (M=2.90, SD=1.560) 3 Results • Minor findings
4 Discussion • Note that in 3 of the findings, the trend is backwards (i.e., the “readability” of the graphs got worse in 2001)
4 Discussion • What is “readability”? Are these guidelines getting at some underlying concept, or is graph readability so personalized that there is no common ground? • Since software packages can significantly improve most of these guidelines, should we be calling for a better software tool?
5 Conclusion • The Visual Rating System is a valuable tool for identifying trends in graph design, and can act as an author’s aid for better graphs • Closing thought: Though a picture may be worth a thousand words, an ill-conceived picture may be worth no words at all—or worse, may require two thousand words of explanation.
Visual Rating System for HFES Graphics:Design and Analysis Paul Aumer-Ryan School of Information The University of Texas at Austin October 18, 2006 HFES 2006: San Francisco
Graph from: Brener, N. E., Iyengar, S. S., & Pianykh, O. S. (2005). A conclusive methodology for rating OCR performance. Journal of the American Society for Information Science and Technology, 56(12), 1274-1287.