Data Triangulation in a User Evaluation of the Sealife Semantic Web Browsers

Data Triangulation in a User Evaluation of the SealifeSemantic Web Browsers Helen Oliver Patty Kostkova Ed de Quincey City eHealth Research Centre (CeRC) City University London

User-Centred Evaluation of Semantic Web Browsers • The Semantic Web for Life Sciences • Browse for meaning • Find answers to critical questions faster • Computer scientists love SWBs! • First-ever user-centred evaluation of SWBs recruiting REAL-WORLD users • Do real users love SWBs too? • Realistic user-centred evaluation has been neglected for SWBs!

User-Centred Evaluation of Semantic Web Browsers • Use Triangulation to consider all angles • Essential to our innovative evaluation framework ( Quantitative data: • Web server logs • Questionnaire results + Qualitative data: • Semi-structured interviews ) = (Validation AND Completeness) • Triangulation has been neglected in user-centred evaluations of SWBs!

Group A1: Infectious Disease Professionals COHSE vs NeLI CORESE-based SWB vs NeLI

Group A2: MicrobiologistsGoPubMed/GoGene vs PubMed

Use of Triangulation for Semantic Web • Quantitative Data Sources: • Web Form Questionnaires • Pre-questionnaire • Post-task questionnaires • Post-questionnaire • Web Server Logs • Qualitative Data Sources: • Semi-Structured Interviews (subset of participants) • Evaluation Settings: • Online • Workshops

Questionnaires Findability Usability System Speed Relevance Likeability Web Server Logs Task Completion Time Usage of Semantic Links # of External Pages Viewed Views of Target Documents Semi-Structured Interviews Answers to questions we didn’t think to ask… Observe participants to assess system intuitiveness Value of Data Triangulation in Interpreting the Results

Sealife ResultsCOHSE:67 respondents39 online28 in workshopsCORESE: 14 respondents2 online (only 1 completed)12 in workshops GoPubMed:137 online4 in workshopGoGene + Extended GoPubMed:14 in workshop Qualitative results not statistically significant (few interviews conducted)

Web Server Logs • PubMed was faster than GoGene • Faster => Better… • So, users liked PubMed better than GoGene – right? • Web Server Logs Don’t Lie!

Questionnaires • Best for: • Likeability • Information Findability • Relevance • System Speed • GoPubMed/GoGene • Usability • COHSE • Highest Number of Positive Ratings: • GoPubMed/GoGene • Largest Positive Mode Differences Between Control and Intervention: • GoPubMed/GoGene • Fewest Negative Mode Ratings Compared to Control: • GoPubMed/GoGene NEVER had worse mode scores than PubMed!

Semi-Structured Interviews • So the winner is GoPubMed/GoGene • COHSE was rated the most usable • what more could we want? • Well… • Critiques in GoPubMed/GoGene interviews were about the details • Critiques in COHSE/CORESE interviews were about being able to use the systems at all • At first, it turned out that some could not tell control from intervention! • When asked for critiques of COHSE or CORESE, users gave abundant detail… about NeLI! • Yes, but what about COHSE? “Those awful little boxes? They were really distracting, I didn’t really understand what they were.” • Presentations explaining the SWBs improved users’ understanding

Validation • We were expecting discrepancy between logs, questionnaires, and interviews • True for COHSE’s findability ratings • Workshop users rated it as adequate or good • Logs showed that none of these users had found the answer • Triangulation revealed discrepancies in plausible results • Otherwise users were generally consistent • We suspected one user of giving fake answers because she was exceptionally positive in her questionnaires and interview • Task logs showed that she was one of the fastest (1-2 min per task) • …but 2 others were faster! • Logs showed that she activated 4 link boxes • …matching the median for all respondents • Logs showed that she viewed only 1 external page • …but some users didn’t view any and of those who did, 1 page was the mode • Triangulation validated suspicious results

Completeness • Logs showed that interviewees who spoke negatively about COHSE often had spent a long time on it • Longer than 5 minutes • Longer than they spent on the control platform • Several users spent more time on GoGene than on PubMed or the extended GoPubMed, but: • Said GoGene was their favourite • Rated it highly on the questionnaires • Triangulation shows the whole picture • Faster ! => better • Slower ! => worse

Discussion • GoPubMed/GoGene workshop confirmed positive impressions • CORESE workshop confirmed negative questionnaire results • GoPubMed/GoGene workshop also confirmed: • That problems with this SWB were the most trivial • That somewhat higher questionnaire results masked dramatically better user experiences • Impressions that COHSE was more usable were quashed by contact with users at workshop • Severity of problems would have gone undetected without interviews • Low number of interviews means triangulation was not complete • Recruitment difficult given time pressures on user base • Workshops are resource-intensive • Future work: carefully sample a subset for interview • Time constraints prevented gathering of observational data in situ • Future work: use video and/or eye tracking software

Conclusion • We have developed a method of triangulating quantitative and qualitative data in user-centred evaluation of SWBs • This addresses a need for greater attention to a technique which is essential for accurate interpretation of data • Having applied our evaluation framework we triangulated: • Quantitative data from the web server logs and from questionnaires • Qualitative data from semi-structured interviews eliciting users’ opinions on matters they identified as important

Conclusion • Triangulation was indispensable for an accurate view of the results • Log data gave system speed • Questionnaires and interviews gave the meaning of the log data • Log data showed usage of semantic links • Log data showed whether users found the answers • Questionnaires and interviews revealed discrepancies between what users said and what they did • Questionnaires showed system intuitiveness • Only the interviews showed the full significance of the questionnaire results • Only triangulation could answer the ultimate questions about user satisfaction • If any one data source had been left out, the results could have been misinterpreted

Data Triangulation in a User Evaluation of the Sealife Semantic Web Browsers

Data Triangulation in a User Evaluation of the Sealife Semantic Web Browsers

Presentation Transcript

The Semantic Web: A Web of Machine Processible Data

WEB BROWSERS

Web Browsers

Web Browsers

Web Browsers

Data on the (Semantic) Web

Geospatial data in the Semantic Web stSPARQL

Web Browsers

Semantic Web for the Military User

Triangulation of Data

Web Browsers

Semantic Web for the Military User

Data Triangulation

Semantic Web for the Military User

Web Browsers

Web Browsers

Semantic Web Instance Data Evaluation

Web Browsers

Instance Data Evaluation on the Semantic Web

Semantic Web for the Military User

Web Browsers

Web Browsers