180 likes | 189 Views
This article discusses the preservation of data through change detection and understanding of evolution. It explores the challenges and architecture of the DIACHRON system and presents a case study on change detection in a pilot dataset. The article also discusses the representation of changes and provides a summary of the DIACHRON approach.
E N D
Preservation Through Evolution Management: The DIACHRON ApproachDIACHRON Final Dissemination Workshop 24.03.2016 Giorgos Flouris (FORTH)fgeo@ics.forth.gr
Preservation and Evolution Management • Two sides of the same coin • Understanding evolution allows preservation • Preservation through change detection • Terminology changes (e.g., Yugoslavia) • Modelling changes (e.g., Pluto is a Planet) • Trace back our understanding at a given point in time, by “reverse engineering” changes • Equivalent to keeping the old versions, but: • Cheaper (in terms of space) • Helps understand (not just access) older versions
Change Detection Challenges • Change detection for evolution management • Identifying changes between versions • Challenges • Going beyond simple “delta” solutions • High-level deltas • More intuitive lists of changes • Without loss of formal rigor
Additional challenges in DIACHRON • Change detection challenges (in DIACHRON) • Diverse data models • Dynamic datasets • Recoverable versions • Changes as first-class citizens • Cross-snapshot queries
Change Detection in DIACHRON Pilot dataset DIACHRON DIACHRON Version 1 Change Change Pilot dataset Version 2
Defining Changes: Layers Low-level Universal Simple Model-specific Complex User-specific
Change Hierarchy: Low-level (1/3) • Low-level changes • DIACHRON model, for internal use • Fixed: Add, Delete • Just additions and deletions of triples • Simple set difference
Change Hierarchy: Simple (2/3) • Pilot terminology: • Add_SuperClassAdd_Dimension • Fixed, pre-defined • Comprising of low-level changes • Partitioning is perfect • Complete and unambiguous
Change Hierarchy: Complex (3/3) • Pilot terminology: • Add_Synonym, Mark_As_Obsolete • Totally custom, pilot-specific (defined at run-time)
Detecting Changes Based on SPARQL queries Add_SuperClass (simple) Mark_as_Obsolete (complex) INSERT INTO <changesOntology> { ?mao a co:Mark_As_Obsolete; co:mao_p1 ?a; co:mao_p2 ?x; co:consumes ?asc; co:consumes ?al. } WHERE { GRAPH <changesOntology> { ?asc a co:Add_Superclass; co:asc_p1 ?asc1; co:asc_p2 ?asc2. FILTER NOT EXISTS { ?maoco:consumes ?asc. }. FILTER (?asc2 = <http://www.geneontology.org/formats/oboInOwl#ObsoleteClass>). BIND(?asc1 as ?a). OPTIONAL { ?al a co:Add_Label; co:al_p1 ?al1; co:al_p2 ?al2. FILTER NOT EXISTS { ?maoco:consumes ?al. }. FILTER(?al1 = ?asc1). FILTER(regex(str(?al2), 'obsolete_')). } BIND(concat(str(?a), str(?x)) as ?url) . filter ('v1'=?v1). filter ('v2'=?v2). BIND(IRI(CONCAT('http://mao/',SHA1(?url))) AS ?mao). } } INSERT INTO <changesOntology> { ?asc a co:Add_Superclass; co:asc_p1 ?a; co:asc_p2 ?b. } WHERE { GRAPH <v2> { ?r diachron:subject ?a; diachron:hasRecordAttribute ?ratt. ?rattdiachron:predicaterdfs:subClassOf; diachron:object ?b. } FILTER NOT EXISTS { GRAPH <v1> { ?r diachron:hasRecordAttribute ?ratt. ?rattdiachron:predicaterdfs:subClassOf; diachron:object ?b. } } FILTER NOT EXISTS { GRAPH <assoc> { {?assoc1 co:new_value ?a.} UNION {?assoc2 co:new_value ?b.} } } BIND(IRI('v1') as ?v1). BIND(IRI('v2') as ?v2). BIND(concat(str(?a), str(?b), str(?v1), str(?v2)) as ?url) . BIND(IRI(CONCAT('http://asc/',SHA1(?url))) AS ?asc). }
Representing Changes: Motivation • Interesting motivating query • Return all countries for which the unemployment rate of their capital city increased faster than the average increase of the country as a whole, in the last 5 versions • Requires • Access to both the changes and the data • Access to multiple versions • Changes are first-class citizens • Necessary for preservation
Representing Changes: Ontology DIACHRON D/changes/App1/schema Change Data Complex_Change Simple_Change INSERT … sparql_info Mark as Obsolete Add SuperClass … … Schema level Data level EFO_001927 asc_p1 SC1 ObsoleteClass asc_p2 D/changes/v1-v2
Putting it All Together • DIACHRON data model contains all versions as well as changes • In a compact form (ontology of changes) • Detection based on SPARQL queries • Provided at deployment time (for simple) • Generated at creation time (for complex) • Recoverability • Allows moving back and forth between versions (important for preservation, and also for archiving)
Summary of Changes • Problem • Lots of changes in a single version pair • Look at only a subset of the delta • Need for more intuitive deltas • Solution • Pinpoint locations in the ontology where “important” changes happened • Assessment strategies for “change summaries” • Number of changes, change of centrality/relevance, importance of position, hybrid strategies
D2V Demo • D2V tool for: • Creating and managing complex changes • Visualizing the evolution history of a dataset • Demonstration video • https://www.youtube.com/watch?v=oY7qBBfcHYg • http://www.diachron-fp7.eu/videos.html • Online (live) demo • http://www.diachron-fp7.eu/demos.html
Conclusion • Main DIACHRON message • (Linked) data preservation is related to evolution management • DIACHRON challenges • Diverse data models • Dynamic datasets • Recoverable versions • Changes as first-class citizens • Cross-snapshot queries • Solutions • DIACHRON data model (#1) • Appropriate change definition and detection (#2, #3) • Changes and data represented at the same level (#4, #5) • Work with high potential (e.g., summaries)