A Method for a Comparative Evaluation of Semantic Browsing

A Method for a Comparative Evaluation of Semantic Browsing Sami Lini – Research Intern at the DERI, Galway – July 2008

Table of contents • Introduction • The dataset • Introduction • Ontology development • Data sources • The interfaces • Introduction • The Blacklight Project • The evaluation protocol • The tasks • The measures • Conclusion

Introduction • What is faceted-browsing? • Browsing a dataset by applying constraints classified in categories and sub categories (facets) • Why for the Semantic Web • Data described by several variables • Good way to narrow down big datasets

Introduction • Why a Human-Computer Interaction (HCI) evaluation? • Semantic Web  “dynamic” data treatment (domain independant) • HCI challenges • Interface design: • Exploit the full potential of the SW • Cognitive charge (memory, attention…) • Explanation of what the Semantic Web empowers the users to do • No benchmark dataset, no typical evaluation protocol (TREC)

Introduction • Requirements • Why a books dataset? • Exhaustive enough • General topic • Free to use • Allow faceted browsing

How we create the Book dataset

The dataset IntroductionOntology developmentData sources • Ontology building issues: • Use of existing ontologies • Taking into account the most information possible • Way to match the different datasets • Solution: • Mapping on ISBN/LCCN values ISBN & LCCN • ISBN: International Standard Book Number • LCCN: Library of Congress Control Number • One book = several ISBN and LCCN (different editions)

IntroductionOntology developmentData sources

The dataset IntroductionOntology developmentData sources • Book-Crossing Dataset • CSV dataset: user ID, books ratings, ISBN values • C++ script to convert it into RDF

IntroductionOntology developmentData sources Book-Crossing

The dataset IntroductionOntology developmentData sources • OpenLibrary.org • Library of Congress MARC records dataset (books: title, author, publisher, date, categories…) MARC records: • MAchine Readable Cataloguing records • Binary files • Standard way of classifying books in digital libraries • Issues • MARC records contain either ISBN or LCCN values • Many languages = many different alphabets  character conversion issues

IntroductionOntology developmentData sources Book-Crossing OpenLibrary

The dataset IntroductionOntology developmentData sources • LibraryThing.com • Web API: ISBN value  XML: matching ISBN & LCCN values • XSL transformation to convert it into RDF

IntroductionOntology developmentData sources Book-Crossing OpenLibrary LibraryThing

The dataset IntroductionOntology developmentData sources • The RDF Book Mashup • RDF dataset: contains additional books information from Amazon.com • We crawl all instances of foaf:Person to have further information about authors

IntroductionOntology developmentData sources Book-Crossing Book Mashup OpenLibrary LibraryThing

The dataset Data sources Ontology development Book-Crossing OpenLibrary LibraryThing Book Mashup

The interfaces IntroductionThe Blacklight Project • Mandatory criteria: • Faceted-browsing • Ability to handle the same dataset than SWSE (dataset bias) Dataset bias: Book-Crossing dataset (= 130.000 books IDs) + additional information (ISBN/LCCN, categories) = SWSE Books Dataset  Way to index SWSE Books dataset in compared interface

The interfaces IntroductionThe Blacklight Project • The Blacklight Project • Properties • Faceted and keyword search • Index MARC records  RDF to MARC records conversion

The interfaces IntroductionThe Blacklight Project Keywords Facets and facet-values

The evaluation protocol • The tasks • The measures • Tasks • User-based 4 tasks with subtasks: inspired by existing tasks in Digital Libraries and faceted-browsing evaluation • First 3 tasks: scenario based • Last task: based on AOL dataset (link with automatic evaluation): evaluate different type of queries: directed search / simple browse / complex browse

The evaluation protocol • The tasks • The measures • Tasks • User-based Task 3: Gather materials for an essay about French history. Complete 4 subtasks, ranging from very specific to more open ended: • find books about French history written in English; • choose the decade for which the collection seems to have the most books about history; • find all books of an author of your choice, who has published books during the 1980's; • find another U.S. writer who wrote about Charles De Gaulle in a different way.

The evaluation protocol • User-centred evaluation procedure • Consent form • Questionnaire (demographic questions) • Performing tasks on Blacklight/SWSE(inter-interfaces bias) after reading written tasks (inter-users bias) • Questionnaire (System Usability Scale about Blacklight) • Performing tasks on SWSE/Blacklight after reading written tasks • Questionnaire (System Usability Scale about SWSE) • Questionnaire about global preferences and suggestions

The evaluation protocol • The tasks • The measures • Tasks • Automatic • Automatically crawl both interfaces according to a rating table and heuristics performance evaluation • Use of ratings (faceted-browsing) (ex: books having a better rating than…) • Use of AOL datasets (keyword search/faceted-browsing)(ex: “books on managing family home work school children social life and time for me”)

The evaluation protocol • The tasks • The measures • Measures • User-based • Objective • Time taken • Click • Search terms used • Success score • Subjective • Time evaluated by the user • Automatic • “Clicks” • Response time

Conclusion • What has been done: • Large benchmark dataset • Comparative evaluation methodology • Blacklight up and running • Limits: • Tasks inspired by existing works  No typical evaluation protocol • Blacklight interface: Specialised interface (domain specific) • Interface bias

A Method for a Comparative Evaluation of Semantic Browsing

A Method for a Comparative Evaluation of Semantic Browsing

Presentation Transcript

A Comparative Usability Evaluation of User Interfaces for Online Product Catalogs

Quality Evaluation (QE) A Method for Independent Test and Evaluation of In-Service Munitions

Evaluation of Semantic Search

KD2R: a Key Discovery method for semantic Reference Reconciliation

A Semantic Clustering-based Approach For Searching And Browsing Tag Spaces

A Comparative Evaluation of Three Skin Color Detection Approaches

Comparative method: Mill’s method of difference

A Comparative Evaluation of HTML5 as a Pervasive Media Platform

A Semantic Caching Method Based on Linear Constraints

A Comparative Evaluation of Transparent Scaling Techniques for Dynamic Content Servers

Dynamic Social Knowledge: A Comparative Evaluation

A Method for Online Reorganization of a Database

A web application for browsing research papers

A Method for Defining Semantic Similarities between GML Schemas

Evaluation of a Stochastic Oscillator Equity Trading Method

A simulation-based comparative evaluation of transport protocols for SIP

A Semantic-Frame-based Comparative Pilot Study German-Korean

A Comparative Evaluation of the Databases LISTA and LISA

A User Evaluation of Hierarchical Phrase Browsing

Comparative Evaluation

Quality Evaluation (QE) A Method for Independent Test and Evaluation of In-Service Munitions

A Comparative Usability Evaluation of User Interfaces for Online Product Catalogs