Creating and Sharing Structured Semantic Web Contents through the Social Web

Creating and Sharing Structured Semantic Web Contents through the Social Web (Main Evaluation) AmanShakya Advisor: Prof. Hideaki Takeda Sub-advisors: Assoc. Prof. Nigel Collier Assoc. Prof. KenroAihara

Outline • Introduction • Social Semantic Web • State-of-art and Problems • Proposed approach • The StYLiD system • Concept consolidation • Concept grouping • Evaluation • Practical applications • Conclusions main evaluation

Introduction main evaluation

Background • Information Sharing • Information publishing • Understandable semantics • Information dissemination • Shared information • Better utilization  Increased value • Shared information put together • Valuable knowledge main evaluation

Social Web and Web 2.0 • Easy to publish, understand and use • Information sharing platform • User generated contents • Connecting people • Collaboration • Mass participation – Power of People • Wisdom of the crowds main evaluation

Current Limitations and Needs • Data processing and automation • Unstructured data only for humans • Interoperability • Sharing data across different applications • Integration • Combining data from different applications main evaluation

The Semantic Web • Web of Structured Data • Machine understandable semantics • Ontologies • Represent Conceptualizations of things • Consensus and common formats • Enables • Automated processing • Interoperation and Integration • Effective search and browsing main evaluation

Challenges ? • Difficult to publish on the Semantic Web • Wide variety of data to share • Long Tail of information domains (Hunyh et al. 2007) • Not enough ontologies • Ontology creation is a difficult process • Goal - To enable people to easily share wide variety of semantically structured data main evaluation

Social Semantic Web • Social software + Semantic Web • Web 3.0 Social connectivity Social Semantic Web Information connectivity - Adapted from (Decker, 2005) main evaluation

State-of-Art: Social Semantic Web Structured content creation on the Social Semantic Web Direct Structured Contents Derived Structured Contents Instance Data Creation Semantification of Social Data Data Exporters Semantic Blogging Scrapers Semantic Bookmarking Semantics of Tags Semantic Desktop Semantics from Text Semantic Annotation Emergent Semantics Ontology + Instance Data creation Semantic Wikis Collaborative Ontology Creation main evaluation

Collaborative Knowledge Base Creation Knowledge base = ontology + instance data Collaborative Knowledge Base Users Users main evaluation

Collaborative Knowledge Base Creation Systems main evaluation

Problems • Complexity and learning curve • Powerful collaborative systems difficult for ordinary people • Difficult to create perfect concept definitions and ontologies • Difficult to accommodate all requirements • Strict constraints can make the model rigid • Existence of multiple conceptualizations • Different perspectives or contexts • Difficulty of collaboration and consensus main evaluation

Proposed Approach main evaluation

Local KB Local KB Local KB Proposed Collaborative Knowledge Base Creation Collaborative Knowledge Base Users Users Users main evaluation

Overview of Proposed Approach Structured Data Collection Concept Consolidation Social Platform for Structured Data Authoring Schema Alignment Concepts Instances Concept Grouping Structured Linked Data Grouped concepts Browsing, Searching, Services Emerging Lightweight Ontologies User Community main evaluation

StYLiD Structure Your own Linked Data http://www.stylid.org Social Software for Sharing a wide variety of Structured Data Users freely define their own concepts Easy for ordinary people Consolidate multiple concept schemas Group and organize similar concepts Popular evolving concepts definitions main evaluation

“Hotel” Concept Creating a new Concept List of Attributes Description Or Reuse / Modify existing Concept Suggested Value Range main evaluation

Shinjuku Prince Hotel Instance Data Literal value Pick value from Suggested range Resource URI External URI Multiple Values main evaluation

Concept Consolidation • Hotel 1 • Name • Amenities • Capacity • Contact • Price • Access • Rating • Hotel 2 • Name • Facilities • No. of rooms • Phone-number • Single room price • Double room price • Nearest station • Category • Address • Hotel 3 • Name • Price • Rating • City • Country • Near-by attractions • Hotel 4 • Name • Phone-number • Zip-code • Latitude • Longitude • No. of stories same Synonymous / different labels Different Contexts / Perspectives Many-to-one Complimentary main evaluation

Hotel (Consolidated Concept ) • Name • Facilities • Capacity • Contact • Single room price • Double room price • Access • Rating • Address • Zip-code • Latitude • Longitude • Near-by attractions • No. of stories Consolidated Concept main evaluation

Concept Consolidation • A concept consolidation C is defined as a triple < , S, A> where • - consolidated concept • S - set of constituent concepts {C1,C2 ,…..Cn} • Ais the attribute alignment between andS • Based on Global-as-View (GAV) approach for data integration (Lenzerini, 2002) • Global schema defined as views on source schemas • Consolidated Concept with consolidated attributes • aligned to source concept attributes as views main evaluation

Concept Consolidation < , S, A> image view aligned( , ) aligned( , ) aligned( , ) A = { , … } main evaluation 23

Concept Consolidation • Consolidated view of instances • Translation of instances • From one conceptualization to another • Query Unfolding (Advantage of GAV over LAV) • Queries over(in terms of attributes) to queries over {C1,C2 ,…..Cn} • Using alignment A • Union of results • Translation of queries main evaluation

Concept Cloud Consolidated concept Sub-Cloud main evaluation

Experiment on Conceptualization Hypothesis Multiple conceptualizations by different people for the same thing can be consolidated Methodology Participants given short text passages (6 participants) List down Facts structured as (Attribute, Value) table All concept schemas aligned manually Concept schema main evaluation 26

Observations Types of Alignment Relations found Attribute label similarity main evaluation

Remarks • People can express their conceptualizations in terms of schema • Different people have different conceptualizations • No one covers all possible attributes • Conceptualizations overlap significantly • Most parts can be aligned • Most have simple alignment relations • Multiple conceptualizations can be consolidated main evaluation 28

Alignment of Concept Schemas • Attribute Alignments suggested Automatically • Alignment API implementation (with WordNet extension) (Euzenat, 2004) • Community-supported alignment • Human intelligence + Machine intelligence • Alignments are represented and saved • Alignment ontology (Hughes and Ashpole, 2004) • Alignment API alignment specification language (Euzenat et al., 2004) • Other formats : C-OWL, SWRL, OWL axioms, XSLT, SEKT-ML and SKOS. • Incremental alignment (maintained collaboratively) • A Unified View • Consolidated concept with Consolidated Attributes • Homogenous table of data main evaluation 29

Semi-automatic Schema Alignment Two Hotel concepts x Consolidated attributes main evaluation

Consolidated Structured Search Find all hotels with location “Tokyo” and type “luxury” Search on Consolidated Concept Hotel 1 ---- Hotel 2 location  address type  category main evaluation

Concept Grouping Concept Similarity ConceptSim(C1, C2) = w1*NameSim(N1, N2) + w2*SchemaSim(S1, S2) NameSim WordNet-based similarity - Lin’s algorithm (1998) Levenshtein distance SchemaSim Average similarity of best matching pairs of attributes Calculate ConceptSim between all pairs of concepts Group similar concepts above Threshold main evaluation 32

Schema Similarity • Calculate NameSim for all pairs of attributes to create an n1*n2 matrix M = [NameSim(A1X A2)] • Find best matching pairs using Hungarian Algorithm (M) (Kuhn, 1955; Munkres, 1957) • Calculate matching average SchemaSim(S1, S2) = 2xSimilarity of best matching pairs / (|A1|+|A2|) Adapted from Semantic similarity between sentences (Simpson and Dao, 2005) S2 S1 A2 A1 main evaluation

Visualization of Concepts Grouping Cytoscape main evaluation

Experiments on Freebase Data Purpose Evaluate automatic schema alignment Evaluate proposed concept grouping method Observations about user-defined concepts Community-driven database of world’s information User-defined Types – concept schemas Queried out (May 20, 2008) Cleaning Filter out test types, stop-words, types without instances main evaluation 35

Observations • After cleaning • 1,412 concepts • 500 users who defined concepts • People want to share a wide variety of data • People define their own concept schemas • Most people only define few concepts (1-5) • Long tail of information types main evaluation

Freebase Concept Consolidation Concepts with same name, synonyms, morphological variants 57 consolidated concepts formed Multiple versions of concept by different users Up to 6 versions of the same concept Same user also defines multiple versions Alignments suggested automatically 51 alignment relations (44 aligned attribute sets) Human judgement Precision 88.24% Recall 67.16% main evaluation 37

Concept Consolidation Example Aligned attribute Sets (adapted from Freebase) • {Recipe(user1), Recipe(user2), Recipes(user3) ….} r1r2r3 • Consolidated concept - Recipe • Consolidated attributes • {r1#ingredient, r2#ingredients, r3#materials} • {r1#steps, r2#instructions} • r3#directions • r2#tools_required • r3#taste • r3#author …… main evaluation 38

Evaluation of Concept Grouping ConceptSim(C1, C2) = w1*NameSim(N1, N2) + w2*SchemaSim(S1, S2) Concept grouping with different thresholds (w1 = 0.7, w2 = 0.3) Concept grouping with different weights (threshold = 0.8) main evaluation 39

Emergence of Lightweight Ontologies • Concepts contributed by community • Concept consolidation • Concept grouping • Popularity of concepts (as in Tag clouds) • Common vocabulary for structured information sharing • Conceptual schemas (class/property) • Informal organization by similarity main evaluation

Informal Lightweight Ontology source: Schaffert et al. (2005) p. 7 main evaluation

Evaluation main evaluation

Evaluation of Usability • Hypothesis • StYLiD is more usable than Freebase (for given tasks) • Methodology • Tasks performed with StYLiD and Freebase • Task 1 - Structured data authoring • Task 2 - Concept schema creation • Task 3, 4 - Modifying and reusing concepts • Task 5 - Structured concepts and instances authoring • Task 6 - Searching • Observations • Questionnaires, screen logs, comments, etc main evaluation

Example (Task 1) Input Band – The Beatles main evaluation

Participants • Total 15 participants • Including 6 without IT background • Different backgrounds • Public policy, international relations, psychology, telecommunication, networks, hotel staff, etc. • From 10 countries • Age : 22 – 43 (avg. 28.3) • Most did not know the systems before main evaluation

Results • System Usability Scale (SUS) (Digital Equipment Corp.) • Average scores: StYLiD – 69.7%, Freebase – 39.3% • Enhanced Semantic MediaWiki – 54.8% (Pfisterer et al., 2008) • Aggregated results from the Tasks (score: 0-4) main evaluation

Results for non-IT participants • 6 participants • SUS scores • StYLiD (71.67%), Freebase (50.42%)

Observations • StYLiD quite usable without any training, knowledge or help • Most users preferred StYLiD to Freebase • Specifying attribute value range not easy • Strict data type constraints can cause problems • Many people modify and reuse concepts • People try to input all data in minimum steps • Data entry can be made easier and quicker • Auto-complete mechanisms would be helpful main evaluation

Comparison with some systems main evaluation

Practical Applications main evaluation

Creating and Sharing Structured Semantic Web Contents through the Social Web

Creating and Sharing Structured Semantic Web Contents through the Social Web

Presentation Transcript

The Semantic Web

The Semantic Web

The Semantic Web

Engineering the Personal Social Semantic Web

The Semantic Web

The SEMANTIC Web

The Semantic Web

The Semantic Web

Information Sharing on the Social Semantic Web

The Semantic Web

Languages for the Semantic Web and Semantic Web Services

The Semantic Web

Semantic Web 2.0: Creating Social Semantic Information Spaces

The Semantic Web

The Semantic Web

The Semantic Web

SOCIAL SEMANTIC WEB INTRO

The Semantic Web

Languages for the Semantic Web and Semantic Web Services

The Semantic Web