240 likes | 474 Views
http:// dx.doi.org /10.6084/m9.figshare. 701216. Benchmarking Web Accessibility Evaluation Tools:. Measuring the Harm of Sole Reliance on Automated Tests. Markel Vigo University of Manchester (UK) Justin Brown Edith Cowan University (Australia )
E N D
http://dx.doi.org/10.6084/m9.figshare.701216 Benchmarking Web Accessibility Evaluation Tools: Measuring the Harm of Sole Reliance on Automated Tests Markel Vigo University of Manchester (UK) Justin Brown Edith Cowan University (Australia) Vivienne Conway Edith Cowan University (Australia) 10th International Cross-Disciplinary Conference on Web Accessibility W4A2013
Problem & Fact WWW is not accessible 13 May 2013 W4A2013
Evidence Webmasters are familiar with accessibility guidelines Lazar et al., 2004 Improving web accessibility: a study of webmaster perceptions Computers in Human Behavior 20(2), 269–288 13 May 2013 W4A2013
Hypothesis I Assuming guidelines do a good job... H1: Accessibility guidelines awareness is not that widely spread. 13 May 2013 W4A2013
Evidence II Webmasters put compliance logos on non-compliant websites Gilbertson and Machin, 2012 Guidelines, icons and marketable skills: an accessibility evaluation of 100 web development company homepages W4A 2012 13 May 2013 W4A2013
Hypothesis II Assuming webmasters are not trying to cheat... H2: A lack of awareness on the negative effects of overreliance on automated tools. 13 May 2013 W4A2013
Expanding on H2Why we rely on automated tests • It's easy • In some scenarios seems like the only option: web observatories, real-time... • We don't know how harmful they can be 13 May 2013 W4A2013
Expanding on H2Knowing the limitations of tools • If we are able to measure these limitations we can raise awareness • Inform developers and researchers • We run a study with 6 tools • Compute coverage, completeness and correctnesswrt WCAG 2.0 13 May 2013 W4A2013
MethodComputed Metrics • Coverage: whether a given Success Criteria (SC) is reported at least once • Completeness: • Correctness: 13 May 2013 W4A2013
MethodStimuli Vision Australia www.visionaustralia.org.au Non-profit Non-government Accessibility resource Prime Minister www.pm.gov.au Federal Government Should abide by the Transition Strategy Transperth www.transperth.wa.gov.au Government affiliated Used by people with disabilities 13 May 2013 W4A2013
MethodObtaining the "Ground Truth" Ad-hoc sampling Manual evaluation Agreement Ground truth 13 May 2013 W4A2013
MethodComputing Metrics For every page in the sample... Evaluate Get reports Compare with the GT Compute metrics T1 M1 R1 GT T2 M2 R2 T3 M3 R3 R4 T4 M4 T5 M5 R5 R6 T6 M6 13 May 2013 W4A2013
Accessibility of Stimuli Vision Australia www.visionaustralia.org.au Prime Minister www.pm.gov.au Transperth www.transperth.wa.gov.au 13 May 2013 W4A2013
ResultsCoverage • 650 WCAG Success Criteria violations (A and AA) • 23-50% of SC are covered by automated test • Coverage varies across guidelines and tools 13 May 2013 W4A2013
ResultsCompleteness per tool • Completeness ranges in 14-38% • Variable across tools and principles 13 May 2013 W4A2013
ResultsCompleteness per type of SC • How conformance levels influence on completeness • Wilcoxon Signed Rank: W=21, p<0.05 • Completeness levels are higher for 'A level' SC 13 May 2013 W4A2013
ResultsCompleteness vs. accessibility • How accessibility levels influence on completeness • ANOVA: F(2,10)=19.82, p<0.001 • The less accessible a page is the higher levels of completeness 13 May 2013 W4A2013
ResultsTool Similarity on Completeness • Cronbach's α = 0.96 • Multidimensional Scaling (MDS) • Tools behave similarly 13 May 2013 W4A2013
ResultsCorrectness • Tools with lower completeness scores exhibit higher levels of correctness 93-96% • Tools that obtain higher completeness yield lower correctness 66-71% • Tools with higher completeness are also the most incorrect ones 13 May 2013 W4A2013
ImplicationsCoverage • We corroborate that 50% is the upper limit for automatising guidelines • Natural Language Processing? • Language: 3.1.2 Language of parts • Domain: 3.3.4 Error prevention 13 May 2013 W4A2013
ImplicationsCompleteness I • Automated tests do a better job... ...on non-accessible sites ...on 'A level' success criteria • Automated tests aim at catching stereotypical errors 13 May 2013 W4A2013
ImplicationsCompleteness II • Strengths of tools can be identified across WCAG principles and SC • A method to inform decision making • Maximising completeness in our sample of pages • On all tools: 55% (+17 percentage points) • On non-commercial tools: 52% 13 May 2013 W4A2013
Conclusions • Coverage: 23-50% • Completeness: 14-38% • Higher completeness leads to lower correctness 13 May 2013 W4A2013
Follow up Contact @markelvigo | markel.vigo@manchester.ac.uk Presentation DOI http://dx.doi.org/10.6084/m9.figshare.701216 Datasets http://www.markelvigo.info/ds/bench12/index.html 10th International Cross-Disciplinary Conference on Web Accessibility W4A2013 13 May 2013