360 likes | 498 Views
High-Level Change Detection in the Semantic Web. Giorgos Flouris fgeo@ics.forth.gr. Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece. Joint work with: Vicky Papavassiliou, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides.
E N D
High-Level Change Detectionin the Semantic Web Giorgos Flouris fgeo@ics.forth.gr Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece Joint work with:Vicky Papavassiliou, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides Giorgos Flouris
World Wide Web • WWW (and HTML) focus on human readability • Page presentation (fonts, colors, images, …) • Human understanding • Presentation Semantical content • Content is not formally described (for a machine to understand) • WWW contains documents, not data Giorgos Flouris
Problems with Current Web • Search and access becomes difficult • Software ignorant of the semantical content of a web page • Keyword search • High recall, low precision • Terminological issues • Synonyms (heart disease = cardiac disease) • Hyponyms/hypernyms (parliament members are politicians) • Queries on the semantical content cannot be made • Fetch articles that support B. Obama’s foreign policy • Fetch the home pages of all members of the Greek Parliament Giorgos Flouris
Semantic Web • The Semantic Web is an extension of the current webin which information is given well-defined meaning, better enabling computers and people to workin cooperation(Berners-Lee et al., 2001) • The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries http://www.w3.org/2001/sw/ • [Semantic Web] is a collaborative effort led by W3C with participation from a large number of researchers and industrial partnershttp://www.w3.org/2001/sw/ Giorgos Flouris
Semantic Web in Practice • Web of data, rather than documents • HTML for presentation • Semantical languages for semantical content • Readable and understandable by humans and machines • Semantic Web languages, protocols, etc • Web page annotation (metadata descriptions etc) • Publication of data on the Internet • Efficient communication and manipulation of data over the Internet • Different applications • Efficient searching • Sharing of data (e-science, e-government, remote learning, …) Giorgos Flouris
Ontologies • Backbone of the Semantic Web • Ontologies allow the description of data • Annotation and metadata regarding web pages • Terminological relations (synonyms, hyponyms, …) • Communication and description of data, ideas, beliefs • An ontology is an explicit specification of a shared conceptualization of a domain(Gruber, 1993) • Precise, logical account of the intended meaning of terms, data structures etc • Common (shared) interpretation of terms • Formal vocabulary for information exchange (for humans and machines) Giorgos Flouris
Ontologies in Practice • Basic structures: • Classes (or concepts): collections of objects (e.g., Actor, Politician) • Properties (or roles): binary relationships between objects (e.g., started_on, member_of) • Instances (or individuals): objects (e.g., Giorgos, B. Obama) • Relations between them • Subsumption (Parliament_Member subclass of Politician), instantiation (B. Obama instance of Politician), … • The allowed relations and their semantics depend on the language • Different representation languages for ontologies • RDF, RDFS, DAML+OiL, OWL, OWL-DL, OWL-Lite, OWL2, DLs, … • Usually triple-based Giorgos Flouris
instantiation subsumption Visualization, Triples, Serialization Visualization Triple Representation Serialization (RDF/XML) Period <rdfs:Class rdf:ID=“Period”> </rdfs:Class> <rdf:Property rdf:ID=“participants”> <rdfs:domain rdf:resource=“Onset”/> <rdfs:range rdf:resource=“Actor”/> </rdf:Property> <G_Birth rdf:about Birth> <participants> <Giorgos rdf:about Actor/> </participants> </G_Birth> <rdfs:Class rdf:ID=“Event”> <rdfs:subClassOf rdf:resource=“Period”/> </rdfs:Class> Define classes [Period type Class] Define properties [participants type Property] [participants domain Onset] [participants range Actor] Instantiate/define individuals [G_Birth type Birth] [Giorgos type Actor] [G_Birth participants Giorgos] Define hierarchies [Event subClass Period] Actor Event participants started_on Onset Existing Stuff Birth participants Giorgos G_Birth Giorgos Flouris
Ontology Dynamics • Ontologies change constantly • World changes (dynamic models) • View on the world changes (new knowledge, measurements, etc) • Perspective and usage changes • Example: GO ontology changes daily • Gene Ontology: information about gene products (biology) • Must find a way to cope with changes • Ontology evolution (modify an ontology in response to a change) • Ontology versioning (keep track of versions and their relations) • … • We deal with a peripheral problem (change detection) Giorgos Flouris
What is Change? Real World Ontology EvolutionAlgorithm Delete_Class(…)Pull_Up_Class(…)Rename_Class(…)… Ontology Giorgos Flouris
What is Change Detection? Real World Change Detection Algorithm Delete_Class(…)Pull_Up_Class(…)Rename_Class(…)… Ontology Giorgos Flouris
C1 C2 C3 C4 V1 V2 V3 V4 V5 Keeping Track of Changes • Purpose of this work: change detection • A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way • It is important to store the changes between versions • Visualization of differences • Efficient storage and/or communication • Evolution history • Record changes as they happen (manual or automatic) • Error-prone, difficult (often impossible) Giorgos Flouris
instantiation instantiation subsumption subsumption Sample Evolution Version 1 (V1) Version 2 (V2) Period participants Actor Event Actor Event started_on Birth Persistent Onset participants Evolution started_on Onset Existing Stuff Stuff Birth participants G_Birth Giorgos participants Giorgos G_Birth Giorgos Flouris
Triples in V1 (partial list) [Event type Class] [Period type Class] [Event subclass Period] [participants type Property] [participants domain Onset] [participants range Actor] [Giorgos type Actor] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Onset subclass Event] [Birth subclass Onset] … Triples in V2 (partial list) [Event type Class] [participants type Property] [Event domain participants] [participants range Actor] [Giorgos type Actor] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Onset subclass Event] [Birth subclass Event] … Analyzing the Evolution (Using Triples) Giorgos Flouris
Triples in V2 but not in V1(added triples) [Event domain participants] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Birth subclass Event] Triples in V1 but not in V2(deleted triples) [Period type Class] [Event subclass Period] [participants domain Onset] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Birth subclass Onset] Low-Level Delta Low-Level Delta Add([Event domain participants])Add([Persistent type Class]) …Del([Period type Class])… Giorgos Flouris
instantiation subsumption Analyzing the Evolution (Visually) Version 1 (V1) Version 2 (V2) Period participants Actor Event started_on Actor Event Birth Persistent Onset participants Evolution started_on Onset Existing Stuff participants G_Birth Giorgos Stuff Birth High-Level Delta Generalize_Domain(participants, Onset, Event) Pull_Up_Class(Birth, Onset, Event) Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø) Rename_Class(Existing, Persistent) participants Giorgos G_Birth Giorgos Flouris
Del([participants domain Onset]) Add([participants domain Event]) Del([Period type Class]) Del([Event subclass Period]) Del([Birth subclass Onset]) Add([Birth subclass Event]) Delete_Class (Period,Ø,{Event},Ø,Ø,Ø,Ø) Generalize_Domain(participants, Onset, Event) Pull_Up_Class(Birth, Onset, Event) instantiation subsumption Comparing the Deltas Version 1 (V1) Version 2 (V2) Period participants Actor Event started_on Actor Event Birth Persistent Onset participants Evolution started_on Onset Existing Stuff participants G_Birth Giorgos Stuff Birth participants Giorgos G_Birth Low-level delta High-level delta Giorgos Flouris
Associations (Partitioning) Giorgos Flouris
Low-Level Versus High-Level Deltas • Purpose: • A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way • Low-level deltas • Easier to get • High-level deltas • More concise (e.g., Rename_Class) • More intuitive (e.g., Pull_Up_Class) • Carry additional information (e.g., Generalize_Domain) • Objective: detection of high-level deltas Giorgos Flouris
Language of Changes and Algorithm • Deltas based on some language of changes • A set of formal definitions that describe the changes that can be understood and detected • Can be high-level or low-level • Must be coupled with a corresponding detection algorithm • Low-level languages easy to define (Add(t), Del(t)) • High-level languages more complicated • Several proposals; no standard • Challenges for high-level languages • Must be deterministic (exactly one high-level delta) • Must be fine-grained enough to capture subtle changes • Must be coarse-grained enough to be concise Giorgos Flouris
Proposed Language L • The formal definition of a change consists of: • Changes required in the low-level delta (added/deleted triples) • Conditions that should hold in V1 and/or V2 • Generalize_Domain(P, X, Y) • Del([P domain X]) • Add([P domain Y]) • P existing property in both V1, V2 • X, Y existing classes in both V1, V2 • X subclass of Y in both V1, V2 • Generalize_Domain(participants, Onset, Event): detectable • Similarly for the other changes in L (about 120 in total) Giorgos Flouris
Results on L: Granularity • Granularity problem: solved by defining levels of changes • Basic Changes: fine-grained, roughly correspond to low-level • Composite Changes: coarse-grained, group several basic changes together • Heuristic Changes: based on heuristics, necessary for Rename, Merge, Split etc • Problems with determinism • One evolution could correspond to different sets of basic/composite changes • Priorities in detection • Heuristic Composite Basic Giorgos Flouris
Results on L: Types of Changes Changes Low-Level High-Level AddDel Basic Composite Heuristic Delete_Subclass Delete_Domain Pull_Up_Class Change_Domain Rename_Class Split_Class Giorgos Flouris
Results on L: Determinism • Each low-level change is associated with exactly one detectable high-level change • Full partitioning of low-level changes into high-level ones • Each pair of versions (V1, V2) is associated with: • Exactly one low-level delta • Exactly one high-level delta • Determinism is necessary • More than one would lead to ambiguities • Less than one would make some inputs (V1, V2) irresolvable Giorgos Flouris
Results on L: Application Version 1 (V1) Version 2 (V2) Period participants Actor Event Actor Event Detect C started_on Birth Persistent Onset participants started_on Apply C Onset Existing Stuff Apply C-1 Stuff Birth participants G_Birth Giorgos participants Giorgos G_Birth Giorgos Flouris
C1 C2 C3 C4 V1 V2 V3 V4 V5 Results on L: Deltas Keep Version History • Can reproduce all versions as long as you keep (any) one version and the deltas • Deltas are more concise than the versions themselves • Storage and communication efficiency Giorgos Flouris
Detection Algorithm for L (1/2) List of Mappings <V1:Existing> is matched with <V2:Persistent> Run Matcher(External) Compute Heuristic Changes Heuristic Changes Rename_Class(Existing, Persistent) Triples in Delta (step 1: low-level) Del([participants domain Onset]) Del([Birth subclass Onset]) Del([Event subclass Period]) Del([Existing type Class]) Del([Stuff subclass Existing]) Del([started_on domain Existing]) Del([Period type Class]) Add([Birth subclass Event]) Add([participants domain Event]) Add([Persistent type Class]) Add([Stuff subclass Persistent]) Add([started_on domain Persistent]) Triples in V2 (Partial List) [Event type Class] [participants type Property] [Event domain participants] [participants range Actor] [Giorgos type Actor] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Onset subclass Event] [Birth subclass Event] … Triples in V1 (Partial List) [Period type Class] [Event subclass Period] [participants type Property] [participants domain Onset] [participants range Actor] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Onset subclass Event] … Calculate Low-Level Delta Giorgos Flouris
Detection Algorithm for L (2/2) Del([participants domain Onset]) Find Associated Change ? ? ? Generalize_Domain(participants, Onset, Event) DETECTABLE Triples in V2 (Partial List) [Event type Class] [participants type Property] [Event domain participants] [participants range Actor] [Giorgos type Actor] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Onset subclass Event] [Birth subclass Event] … Triples in V1 (Partial List) [Period type Class] [Event subclass Period] [participants type Property] [participants domain Onset] [participants range Actor] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Onset subclass Event] … Triples in Delta (step 2: heuristic) Del([participants domain Onset]) Del([Birth subclass Onset]) Del([Event subclass Period]) Del([Period type Class]) Add([Birth subclass Event]) Add([participants domain Event]) Rename_Class(Existing, Persistent) Triples in Delta (step 3: basic and composite) Del([Birth subclass Onset]) Del([Event subclass Period]) Del([Period type Class]) Add([Birth subclass Event]) Rename_Class(Existing, Persistent) Generalize_Domain(participants, Onset, Event) Triples in Delta (step 4: result) Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø) Pull_Up_Class(Birth, Onset, Event) Rename_Class(Existing, Persistent) Generalize_Domain(participants, Onset, Event) Giorgos Flouris
Find Associated Change Operations Pull_Up_Class(*,*,*) [not in the table] Delete_Property(participants,*,*) [necessary triples not found] Specialize_Domain(participants, Onset, Event) [conditions not true] Generalize_Domain(participants, Onset, Birth) [wrong parameter (triples not found)] Generalize_Domain(participants, Onset, Event) [DETECTABLE (ASSOCIATED)] Delete_Domain(participants, Onset) [composite changes have priority] Giorgos Flouris
Implementation • Algorithm implemented for experiments and evaluation • Uses the APIs of SWKM • Platform for efficient and scalable management of dynamic RDF/S ontologies and data • Query, update, low-level delta, high-level delta, versioning, … Giorgos Flouris
Performance • Complexity: O(max{N1,N2,N2}) • Linear average-case • Highly dependent on the detected changes (type, number) Giorgos Flouris
Evaluation: Usefulness and Intuitiveness • L is well-defined (changes used in practice) • GO: add/delete class, comments changing • CIDOC: add/delete/rename properties • Results confirmed by literature/editor notes Giorgos Flouris
Evaluation: Conciseness • Basic ≈ Low-Level • Basic+Composite+Heuristic << Low-Level Giorgos Flouris
Editor notes Delete class: 3 Add property: 54 Delete property: 16 Rename property: 24 Redirect properties (domain): 14 Redirect properties (range): 14 Detection result Delete class: 6 Add property: 58 Delete property: 18 Rename property: 30 Generalize_Domain: 13 Specialize_Domain: 1 Generalize_Range: 14 Specialize_Range: 1 Change_Range: 1 Manual Change Recording (CIDOC) Giorgos Flouris
Conclusion • High-level change detection • A posteriori detection (input: V1, V2) • No further information needed (e.g., logs, change recording etc) • Formal semantics • Formal results (reversibility, determinism, …) • Non-heuristic based (except for heuristic changes) • No need for precision and recall evaluation • Efficient, sound and complete detection algorithm • Nice informal properties • Conciseness, intuitiveness • Future work: more operations, evaluation on other datasets, evaluation with real users Giorgos Flouris
References • Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki,Dimitris Kotzinos,Vassilis Christophides. On Detecting High-Level Changes in RDF/S KBs. In Proceedings of the 8th International Semantic Web Conference (ISWC-09), to appear, 2009 • Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki,Dimitris Kotzinos,Vassilis Christophides. Formalizing High-Level Change Detection for RDF/SKBs. Technical Report TR-398, FORTH-ICS, 2009 Thank You Giorgos Flouris