650 likes | 811 Views
Creating and Sharing Structured Semantic Web Contents through the Social Web. (Main Evaluation) Aman Shakya Advisor: Prof. Hideaki Takeda Sub-advisors: Assoc. Prof. Nigel Collier Assoc. Prof. Kenro Aihara. Outline. Introduction Social Semantic Web
E N D
Creating and Sharing Structured Semantic Web Contents through the Social Web (Main Evaluation) AmanShakya Advisor: Prof. Hideaki Takeda Sub-advisors: Assoc. Prof. Nigel Collier Assoc. Prof. KenroAihara
Outline • Introduction • Social Semantic Web • State-of-art and Problems • Proposed approach • The StYLiD system • Concept consolidation • Concept grouping • Evaluation • Practical applications • Conclusions main evaluation
Introduction main evaluation
Background • Information Sharing • Information publishing • Understandable semantics • Information dissemination • Shared information • Better utilization Increased value • Shared information put together • Valuable knowledge main evaluation
Social Web and Web 2.0 • Easy to publish, understand and use • Information sharing platform • User generated contents • Connecting people • Collaboration • Mass participation – Power of People • Wisdom of the crowds main evaluation
Current Limitations and Needs • Data processing and automation • Unstructured data only for humans • Interoperability • Sharing data across different applications • Integration • Combining data from different applications main evaluation
The Semantic Web • Web of Structured Data • Machine understandable semantics • Ontologies • Represent Conceptualizations of things • Consensus and common formats • Enables • Automated processing • Interoperation and Integration • Effective search and browsing main evaluation
Challenges ? • Difficult to publish on the Semantic Web • Wide variety of data to share • Long Tail of information domains (Hunyh et al. 2007) • Not enough ontologies • Ontology creation is a difficult process • Goal - To enable people to easily share wide variety of semantically structured data main evaluation
Social Semantic Web • Social software + Semantic Web • Web 3.0 Social connectivity Social Semantic Web Information connectivity - Adapted from (Decker, 2005) main evaluation
State-of-Art: Social Semantic Web Structured content creation on the Social Semantic Web Direct Structured Contents Derived Structured Contents Instance Data Creation Semantification of Social Data Data Exporters Semantic Blogging Scrapers Semantic Bookmarking Semantics of Tags Semantic Desktop Semantics from Text Semantic Annotation Emergent Semantics Ontology + Instance Data creation Semantic Wikis Collaborative Ontology Creation main evaluation
Collaborative Knowledge Base Creation Knowledge base = ontology + instance data Collaborative Knowledge Base Users Users main evaluation
Collaborative Knowledge Base Creation Systems main evaluation
Problems • Complexity and learning curve • Powerful collaborative systems difficult for ordinary people • Difficult to create perfect concept definitions and ontologies • Difficult to accommodate all requirements • Strict constraints can make the model rigid • Existence of multiple conceptualizations • Different perspectives or contexts • Difficulty of collaboration and consensus main evaluation
Proposed Approach main evaluation
Local KB Local KB Local KB Proposed Collaborative Knowledge Base Creation Collaborative Knowledge Base Users Users Users main evaluation
Overview of Proposed Approach Structured Data Collection Concept Consolidation Social Platform for Structured Data Authoring Schema Alignment Concepts Instances Concept Grouping Structured Linked Data Grouped concepts Browsing, Searching, Services Emerging Lightweight Ontologies User Community main evaluation
StYLiD Structure Your own Linked Data http://www.stylid.org Social Software for Sharing a wide variety of Structured Data Users freely define their own concepts Easy for ordinary people Consolidate multiple concept schemas Group and organize similar concepts Popular evolving concepts definitions main evaluation
“Hotel” Concept Creating a new Concept List of Attributes Description Or Reuse / Modify existing Concept Suggested Value Range main evaluation
Shinjuku Prince Hotel Instance Data Literal value Pick value from Suggested range Resource URI External URI Multiple Values main evaluation
Concept Consolidation • Hotel 1 • Name • Amenities • Capacity • Contact • Price • Access • Rating • Hotel 2 • Name • Facilities • No. of rooms • Phone-number • Single room price • Double room price • Nearest station • Category • Address • Hotel 3 • Name • Price • Rating • City • Country • Near-by attractions • Hotel 4 • Name • Phone-number • Zip-code • Latitude • Longitude • No. of stories same Synonymous / different labels Different Contexts / Perspectives Many-to-one Complimentary main evaluation
Hotel (Consolidated Concept ) • Name • Facilities • Capacity • Contact • Single room price • Double room price • Access • Rating • Address • Zip-code • Latitude • Longitude • Near-by attractions • No. of stories Consolidated Concept main evaluation
Concept Consolidation • A concept consolidation C is defined as a triple < , S, A> where • - consolidated concept • S - set of constituent concepts {C1,C2 ,…..Cn} • Ais the attribute alignment between andS • Based on Global-as-View (GAV) approach for data integration (Lenzerini, 2002) • Global schema defined as views on source schemas • Consolidated Concept with consolidated attributes • aligned to source concept attributes as views main evaluation
Concept Consolidation < , S, A> image view aligned( , ) aligned( , ) aligned( , ) A = { , … } main evaluation 23
Concept Consolidation • Consolidated view of instances • Translation of instances • From one conceptualization to another • Query Unfolding (Advantage of GAV over LAV) • Queries over(in terms of attributes) to queries over {C1,C2 ,…..Cn} • Using alignment A • Union of results • Translation of queries main evaluation
Concept Cloud Consolidated concept Sub-Cloud main evaluation
Experiment on Conceptualization Hypothesis Multiple conceptualizations by different people for the same thing can be consolidated Methodology Participants given short text passages (6 participants) List down Facts structured as (Attribute, Value) table All concept schemas aligned manually Concept schema main evaluation 26
Observations Types of Alignment Relations found Attribute label similarity main evaluation
Remarks • People can express their conceptualizations in terms of schema • Different people have different conceptualizations • No one covers all possible attributes • Conceptualizations overlap significantly • Most parts can be aligned • Most have simple alignment relations • Multiple conceptualizations can be consolidated main evaluation 28
Alignment of Concept Schemas • Attribute Alignments suggested Automatically • Alignment API implementation (with WordNet extension) (Euzenat, 2004) • Community-supported alignment • Human intelligence + Machine intelligence • Alignments are represented and saved • Alignment ontology (Hughes and Ashpole, 2004) • Alignment API alignment specification language (Euzenat et al., 2004) • Other formats : C-OWL, SWRL, OWL axioms, XSLT, SEKT-ML and SKOS. • Incremental alignment (maintained collaboratively) • A Unified View • Consolidated concept with Consolidated Attributes • Homogenous table of data main evaluation 29
Semi-automatic Schema Alignment Two Hotel concepts x Consolidated attributes main evaluation
Consolidated Structured Search Find all hotels with location “Tokyo” and type “luxury” Search on Consolidated Concept Hotel 1 ---- Hotel 2 location address type category main evaluation
Concept Grouping Concept Similarity ConceptSim(C1, C2) = w1*NameSim(N1, N2) + w2*SchemaSim(S1, S2) NameSim WordNet-based similarity - Lin’s algorithm (1998) Levenshtein distance SchemaSim Average similarity of best matching pairs of attributes Calculate ConceptSim between all pairs of concepts Group similar concepts above Threshold main evaluation 32
Schema Similarity • Calculate NameSim for all pairs of attributes to create an n1*n2 matrix M = [NameSim(A1X A2)] • Find best matching pairs using Hungarian Algorithm (M) (Kuhn, 1955; Munkres, 1957) • Calculate matching average SchemaSim(S1, S2) = 2xSimilarity of best matching pairs / (|A1|+|A2|) Adapted from Semantic similarity between sentences (Simpson and Dao, 2005) S2 S1 A2 A1 main evaluation
Visualization of Concepts Grouping Cytoscape main evaluation
Experiments on Freebase Data Purpose Evaluate automatic schema alignment Evaluate proposed concept grouping method Observations about user-defined concepts Community-driven database of world’s information User-defined Types – concept schemas Queried out (May 20, 2008) Cleaning Filter out test types, stop-words, types without instances main evaluation 35
Observations • After cleaning • 1,412 concepts • 500 users who defined concepts • People want to share a wide variety of data • People define their own concept schemas • Most people only define few concepts (1-5) • Long tail of information types main evaluation
Freebase Concept Consolidation Concepts with same name, synonyms, morphological variants 57 consolidated concepts formed Multiple versions of concept by different users Up to 6 versions of the same concept Same user also defines multiple versions Alignments suggested automatically 51 alignment relations (44 aligned attribute sets) Human judgement Precision 88.24% Recall 67.16% main evaluation 37
Concept Consolidation Example Aligned attribute Sets (adapted from Freebase) • {Recipe(user1), Recipe(user2), Recipes(user3) ….} r1r2r3 • Consolidated concept - Recipe • Consolidated attributes • {r1#ingredient, r2#ingredients, r3#materials} • {r1#steps, r2#instructions} • r3#directions • r2#tools_required • r3#taste • r3#author …… main evaluation 38
Evaluation of Concept Grouping ConceptSim(C1, C2) = w1*NameSim(N1, N2) + w2*SchemaSim(S1, S2) Concept grouping with different thresholds (w1 = 0.7, w2 = 0.3) Concept grouping with different weights (threshold = 0.8) main evaluation 39
Emergence of Lightweight Ontologies • Concepts contributed by community • Concept consolidation • Concept grouping • Popularity of concepts (as in Tag clouds) • Common vocabulary for structured information sharing • Conceptual schemas (class/property) • Informal organization by similarity main evaluation
Informal Lightweight Ontology source: Schaffert et al. (2005) p. 7 main evaluation
Evaluation main evaluation
Evaluation of Usability • Hypothesis • StYLiD is more usable than Freebase (for given tasks) • Methodology • Tasks performed with StYLiD and Freebase • Task 1 - Structured data authoring • Task 2 - Concept schema creation • Task 3, 4 - Modifying and reusing concepts • Task 5 - Structured concepts and instances authoring • Task 6 - Searching • Observations • Questionnaires, screen logs, comments, etc main evaluation
Example (Task 1) Input Band – The Beatles main evaluation
Participants • Total 15 participants • Including 6 without IT background • Different backgrounds • Public policy, international relations, psychology, telecommunication, networks, hotel staff, etc. • From 10 countries • Age : 22 – 43 (avg. 28.3) • Most did not know the systems before main evaluation
Results • System Usability Scale (SUS) (Digital Equipment Corp.) • Average scores: StYLiD – 69.7%, Freebase – 39.3% • Enhanced Semantic MediaWiki – 54.8% (Pfisterer et al., 2008) • Aggregated results from the Tasks (score: 0-4) main evaluation
Results for non-IT participants • 6 participants • SUS scores • StYLiD (71.67%), Freebase (50.42%)
Observations • StYLiD quite usable without any training, knowledge or help • Most users preferred StYLiD to Freebase • Specifying attribute value range not easy • Strict data type constraints can cause problems • Many people modify and reuse concepts • People try to input all data in minimum steps • Data entry can be made easier and quicker • Auto-complete mechanisms would be helpful main evaluation
Comparison with some systems main evaluation
Practical Applications main evaluation