Testing & Usability: Making It Work

Testing & Usability: Making It Work Joseph A. Busch & Ron Daniel, Jr.

Agenda • Qualitative methods • Quantitative methods

Qualitative taxonomy testing methods

Walk-through method—Show & explain Public Utility XYZ Audiences Organizations Services Facilities Utility Systems Content Types General Audience Business Customer-Owners Employees Education Finance Job Seekers Media National Power Industry Recreation Interests Regional Regulators Licensing & Compliance Stakeholders Retirees Administration Finance & Technology Distribution Services Generation Customer & Environmental Services Corporate & Treasury Careers Commissioners Customer Service Distribution Education Environmental Fish & Wildlife Forestry & Tree Trimming Hydro Parks Pwr Conservation Pwr Industry Info Power Mgmt Procurement Public Info Recreation Licensing & Compliance Retiree Info Safety SNAP Tours Water/Wastewater Wtr Conservation Wholesale Fiber Other Services Hydro Projects Hatcheries Parks Water Wastewater Fiber Networks Distribution System Substations & Switchyards Transmission Support Facilities Communication Sites Communication Equipment Conductors & Devices Conduit Electric Equipment Accessories Equipment - Misc. by Service Fiber Backbone Fiber Customer Connections Fiber Distribution Fire Mains Fisheries Equipment Franchises & Consents Fuel Tanks & Accessories Generators, Turbines & Waterwheels Hydrants Laboratory Equipment Land & Land Rights by Service etc. General Information Agenda Annual report Audio Brochure Budget Contract Correspondence Directory Drawing Form FAQ Job Listing Map Memo Minutes Newsletter Photo Plan News Release Presentation Procedure Report Schedule Standard Video

Walk-through method— Editorial rules consistency check • Abbreviations • Ampersands • Capitalization • General…, More…, Other… • Languages & character sets • Length limits • Multiple parents • Plural vs. singular form • Scope notes • Serial comma • Sources of terms • Spaces • Synonyms & acronyms • Term order (Alphabetic or …) • Term label order (Direct vs. inverted) …

Usability testing method—Task-based card sorting (1) • 15 representative questions were selected • Perspective of various organizational units • Most frequent website searches • Most frequently accessed website content • Correct answers to the questions were agreed in advance by team. • 15 users were tested • Did not work for the organization • Represented target audiences • Testers were asked “where would you look for …” • “under which facet… Topic, Commodity, or Geography?” • Then, “… under which category?” • Then, “…under which sub-category?” • Tester choices were recorded • Testers were asked to “think aloud” • Notes were taken on what they said • Pre- and post questions were asked • Tester answers were recorded

Usability testing method—Task-based card sorting (2) 3. What is the average farm income level in your state? • Topics • Commodities • 3. Geographic Coverage 1. Topics 1.1 Agricultural Economy 1.2 Agriculture-Related Policy 1.3 Diet, Health & Safety 1.4 Farm Financial Conditions 1.5 Farm Practices & Management 1.6 Food & Agricultural Industries 1.7 Food & Nutrition Assistance 1.8 Natural Resources & Environment 1.9 Rural Economy 1.10 Trade & International Markets 1.4 Farm Financial Conditions 1.4.1 Costs of Production 1.4.2 Commodity Outlook 1.4.3 Farm Financial Management & Performance 1.4.4 Farm Income 1.4.5 Farm Household Financial Well-being 1.4.6 Lenders & Financial Markets 1.4.7 Taxes

Analysis of task-based card sorting (1)

Analysis of task-based card sorting (2) • In 80% of the trials users looked for information under the categories that we expected them to look for it. • Breaking-up topics into facets makes it easier to find information, especially information related to commodities.

Analysis of task-based card sorting (3) Possible change required. Change required. Policy of “Traceability” needs to be clarified. Use quasi-synonyms. On these trials, only 50% looked in the right category, & only 27-36% agreed on the category. Possible error in categorization of this question because 64% thought the answer should be “Commodity Trade.”

User satisfaction method—Card Sort Questionnaire (1) • Was it easy, medium or difficult to choose the appropriate Topic? • Easy • Medium • Difficult • Was it easy, medium or difficult to choose the appropriate Commodity? • Easy • Medium • Difficult • Was it easy, medium or difficult to choose the appropriate Geographic Coverage? • Easy • Medium • Difficult

User satisfaction method—Card Sort Questionnaire (2) More Difficult Easier

User interface survey— Which search UI is ‘better’? • Criteria • User satisfaction • Success completing tasks • Confidence in results • Fewer dead ends • Methodology • Design tasks from specific to general • Time performance • Calculate success rates • Survey subjective criteria • Pay attention to survey hygiene: • Participant selection • Counterbalancing • T-scores Source: Yee, Swearingen, Li, & Hearst

User interface survey — Results (1) Source: Yee, Swearingen, Li, & Hearst

Google-like Baseline Faceted Category User interface survey — Results (2) Source: Yee, Swearingen, Li, & Hearst

Tagging samples—How many items? WARNING: Quantitative methods require large amounts of tagged content. This leads to having specialists, or software, do the tagging. The results may be very different than how users would categorize.

Tagging samples—Manually tagged metadata sample

Tagging samples—Spreadsheet for tagging 10’s-100’s of items 1) Clickable URLs for sample content 2) Review small sample and describe 3) Drop-down for tagging (including ‘Other’ entry for the unexpected 4) Flag questions

Rough Bulk Tagging—Facet Demo (1) • Collections: 4 content sources • NTRS, SIRTF, Webb, Lessons Learned • Taxonomy • Converted MultiTes format into RDF for Seamark • Metadata • Converted from existing metadata on web pages, or • Created using simple automatic classifier (string matching with terms & synonyms) • 250k items, ~12 metadata fields, 1.5 weeks effort • OOTB Seamark user interface, plus logo

Rough Bulk Tagging— OOTB Facet Demo (2)

Agenda • Qualitative methods • Quantitative methods

Quantitative Method—How evenly does it divide the content? Leading candidate for splitting • Background: • Documents will not distribute uniformly across categories • Zipf (1/x) distribution is expected behavior • 80/20 rule in action (actually 70/20 rule) • Methodology: • Part of alpha test of ‘content type’ for corporate intranet • 115 URLs selected at random from search index were manually categorized. Inaccessible files and ‘junk’ were removed. • Results: • Results were slightly more uniform than the Zipf distribution, which is better than expected Leading candidates for merging Above the curve is better than expected Method warns you if something is strange. Seeing expected behavior does not mean the taxonomy is good.

Quantitative Method—How intuitive (repeatable) are the categorizations? • Methodology: Closed Card Sort • For alpha test of a grocery site • 15 Testers put each of 71 best-selling product types into one of 10 pre-defined categories • Categories where fewer than 14 of 15 testers put product into same category were flagged • Results: “Cocoa Drinks – Powder” is best categorized in both “Beverages” and “Grocery”. How to improve? Allow products in multiple categories. (Results are for minimum size = 4 votes)

Quantitative Method—How does taxonomy “shape” match that of content? • Background: • Hierarchical taxonomies allow comparison of “fit” between content and taxonomy areas • Methodology: • 25,380 resources tagged with taxonomy of 179 terms. (Avg. of 2 terms per resource) • Counts of terms and documents summed within taxonomy hierarchy • Results: • Roughly Zipf distributed (top 20 terms: 79%; top 30 terms: 87%) • Mismatches between term% and document% flagged Source: Courtesy Keith Stubbs, US. Dept. of Ed.

Pop Quiz • What is the #1 underused source of quantitative information on how to improve your taxonomy?

Query Logs & Click Trails—Who are the users & what are they looking for? • UltraSeek Reporting • Top queries • Queries with no results • Queries with no click-through • Most requested documents • Query trend analysis • Complete server usage summary • Query Log & Click Trail Examination • Only 30-40% of organizations regularly examine their logs*. • Sophisticated software available, but don’t wait. • 80% of value comes from basic reports • Governance Foreshadowing • Start a “Measure & Improve” mindset • Taxonomy changes do not stand alone • Search system improvements • Navigation improvements • Content improvements • Process improvements • … Click Trail Packages iWebTrack NetTracker OptimalIQ SiteCatalyst Visitorville WebTrends Source: Daniel, ESS’05

QuestionsJoseph A. Buschjbusch@taxonomystrategies.comRon Daniel, Jr.rdaniel@taxonomystrategies.comhttp://ww.taxonomystrategies.com

Bibliography • K. Yee, K. Swearingen, K. Li, M. Hearst. "Searching and organizing: Faceted metadata for image search and browsing." Proceedings of the Conference on Human Factors in Computing Systems (April 2003) http://bailando.sims.berkeley.edu/papers/flamenco-chi03.pdf • R. Daniel and J. Busch. "Benchmarking Your Search Function: A Maturity Model.” http://www.taxonomystrategies.com/presentations/maturity-2005-05-17%28as-presented%29.ppt

Testing & Usability: Making It Work