160 likes | 304 Views
Data Triangulation in a User Evaluation of the Sealife Semantic Web Browsers. Helen Oliver Patty Kostkova Ed de Quincey City eHealth Research Centre (CeRC) City University London. User-Centred Evaluation of Semantic Web Browsers. The Semantic Web for Life Sciences Browse for meaning
E N D
Data Triangulation in a User Evaluation of the SealifeSemantic Web Browsers Helen Oliver Patty Kostkova Ed de Quincey City eHealth Research Centre (CeRC) City University London
User-Centred Evaluation of Semantic Web Browsers • The Semantic Web for Life Sciences • Browse for meaning • Find answers to critical questions faster • Computer scientists love SWBs! • First-ever user-centred evaluation of SWBs recruiting REAL-WORLD users • Do real users love SWBs too? • Realistic user-centred evaluation has been neglected for SWBs!
User-Centred Evaluation of Semantic Web Browsers • Use Triangulation to consider all angles • Essential to our innovative evaluation framework ( Quantitative data: • Web server logs • Questionnaire results + Qualitative data: • Semi-structured interviews ) = (Validation AND Completeness) • Triangulation has been neglected in user-centred evaluations of SWBs!
Group A1: Infectious Disease Professionals COHSE vs NeLI CORESE-based SWB vs NeLI
Use of Triangulation for Semantic Web • Quantitative Data Sources: • Web Form Questionnaires • Pre-questionnaire • Post-task questionnaires • Post-questionnaire • Web Server Logs • Qualitative Data Sources: • Semi-Structured Interviews (subset of participants) • Evaluation Settings: • Online • Workshops
Questionnaires Findability Usability System Speed Relevance Likeability Web Server Logs Task Completion Time Usage of Semantic Links # of External Pages Viewed Views of Target Documents Semi-Structured Interviews Answers to questions we didn’t think to ask… Observe participants to assess system intuitiveness Value of Data Triangulation in Interpreting the Results
Sealife ResultsCOHSE:67 respondents39 online28 in workshopsCORESE: 14 respondents2 online (only 1 completed)12 in workshops GoPubMed:137 online4 in workshopGoGene + Extended GoPubMed:14 in workshop Qualitative results not statistically significant (few interviews conducted)
Web Server Logs • PubMed was faster than GoGene • Faster => Better… • So, users liked PubMed better than GoGene – right? • Web Server Logs Don’t Lie!
Questionnaires • Best for: • Likeability • Information Findability • Relevance • System Speed • GoPubMed/GoGene • Usability • COHSE • Highest Number of Positive Ratings: • GoPubMed/GoGene • Largest Positive Mode Differences Between Control and Intervention: • GoPubMed/GoGene • Fewest Negative Mode Ratings Compared to Control: • GoPubMed/GoGene NEVER had worse mode scores than PubMed!
Semi-Structured Interviews • So the winner is GoPubMed/GoGene • COHSE was rated the most usable • what more could we want? • Well… • Critiques in GoPubMed/GoGene interviews were about the details • Critiques in COHSE/CORESE interviews were about being able to use the systems at all • At first, it turned out that some could not tell control from intervention! • When asked for critiques of COHSE or CORESE, users gave abundant detail… about NeLI! • Yes, but what about COHSE? “Those awful little boxes? They were really distracting, I didn’t really understand what they were.” • Presentations explaining the SWBs improved users’ understanding
Validation • We were expecting discrepancy between logs, questionnaires, and interviews • True for COHSE’s findability ratings • Workshop users rated it as adequate or good • Logs showed that none of these users had found the answer • Triangulation revealed discrepancies in plausible results • Otherwise users were generally consistent • We suspected one user of giving fake answers because she was exceptionally positive in her questionnaires and interview • Task logs showed that she was one of the fastest (1-2 min per task) • …but 2 others were faster! • Logs showed that she activated 4 link boxes • …matching the median for all respondents • Logs showed that she viewed only 1 external page • …but some users didn’t view any and of those who did, 1 page was the mode • Triangulation validated suspicious results
Completeness • Logs showed that interviewees who spoke negatively about COHSE often had spent a long time on it • Longer than 5 minutes • Longer than they spent on the control platform • Several users spent more time on GoGene than on PubMed or the extended GoPubMed, but: • Said GoGene was their favourite • Rated it highly on the questionnaires • Triangulation shows the whole picture • Faster ! => better • Slower ! => worse
Discussion • GoPubMed/GoGene workshop confirmed positive impressions • CORESE workshop confirmed negative questionnaire results • GoPubMed/GoGene workshop also confirmed: • That problems with this SWB were the most trivial • That somewhat higher questionnaire results masked dramatically better user experiences • Impressions that COHSE was more usable were quashed by contact with users at workshop • Severity of problems would have gone undetected without interviews • Low number of interviews means triangulation was not complete • Recruitment difficult given time pressures on user base • Workshops are resource-intensive • Future work: carefully sample a subset for interview • Time constraints prevented gathering of observational data in situ • Future work: use video and/or eye tracking software
Conclusion • We have developed a method of triangulating quantitative and qualitative data in user-centred evaluation of SWBs • This addresses a need for greater attention to a technique which is essential for accurate interpretation of data • Having applied our evaluation framework we triangulated: • Quantitative data from the web server logs and from questionnaires • Qualitative data from semi-structured interviews eliciting users’ opinions on matters they identified as important
Conclusion • Triangulation was indispensable for an accurate view of the results • Log data gave system speed • Questionnaires and interviews gave the meaning of the log data • Log data showed usage of semantic links • Log data showed whether users found the answers • Questionnaires and interviews revealed discrepancies between what users said and what they did • Questionnaires showed system intuitiveness • Only the interviews showed the full significance of the questionnaire results • Only triangulation could answer the ultimate questions about user satisfaction • If any one data source had been left out, the results could have been misinterpreted