E N D
1. The BioMap Data Warehouse Integration of Relational & XML Data Using AutoMed
3. Data Integration Approaches Both-As-View (BAV) approach
GAV & LAV approaches
BAV approach
Comparison of integration approaches
BAV advantages
4. GAV & LAV Approaches Global-As-View (GAV) approach: describe GS constructs with view definitions over LSi constructs
Local-As-View (LAV) approach: describe LSi constructs with view definitions over GS constructs
5. GAV Example student(id,name,left,degree) = [{x,y,z,w} |?x,y,z,w,_??ug ? ?x,_,_,_,_??phd ?
?x,y,z,w,_??phd ?
w = ‘phd’]
monitors(sno,id) =
[{x,y} |?x,_,_,_,y??ug ? ?x,_,_,_,_??phd ?
?x,y??supervises]
staff(sno,sname,dept) =
[{x,y,z} |?x,y,z,w,_??tutor ? ?x,_,_??supervisor ?
?x,y,z??supervisor]
6. LAV Example tutor(sno,sname) =
[{x,y} | ?x,y,_??staff ? ?x,z??monitors ?
?z,_,_,w??student ?
w ? ‘phd’]
ug(id,name,left,degree,sno) =
[{x,y,z,w,v} |?x,y,z,w??student ? ?v,x??monitors ?
w ? ‘phd’]
7. Both-As-View (BAV) (1/3) Schema transformation approach
For each pair (LSi,GS): incrementally modify LSi/GS to match GS/LSi
8. Both-As-View (BAV) (2/3) Common Data Model: Hypergraph Data Model (HDM)
Constructs are nodes, edges & constraints
It avoids the semantic mismatches that may occur between constructs of higher-level modelling languages
9. Both-As-View (BAV) (3/3) Modify using primitive schema transformations
add/delete
rename
extend/contract
Supply transformations with queries
add(??table,attrib3??, q), where q:[{t,(a1+a2)}|{t,a1}???table,attrib1??;{t,a2}???table,attrib2??]
extend(??table,attrib3??, q1,q2)
10. Example (1/2) S1 ? Sg:
add(??monitors?? ,q1)
add(??monitors,sno??,q2)
add(??monitors,id??,q3)
add(??tutor,dept#??,q4)
rename(??ug??,??student??)
rename(??tutor,??staff??)
delete(??student,sno??,q5)
S2 ? Sg: can be derived similarly
11. Example (2/2) Automatically derivable reverse transformations
add(C,q)/extend(C,q1,q2) : delete(C,q)/contract(C,q1,q2)
delete/contract : add/extend
rename(C1,C2) : rename(C2,C1)
12. BAV vs. LAV, GAV & GLAV BAV approach subsumes other integration approaches:
Can be used to derive GAV & LAV view definitions (ICDE’03)
Comparison with GAV, LAV & GLAV in DBIS'04
13. Schema Evolution Example Define the evolution of the global or local schema as a schema transformation pathway from the old to the new schema
14. Types Of Integration Virtual integration
Materialised integration
Hybrid integration
15. AutoMed Tools Data Lineage Tracing (DLT)
Incremental View Maintenance (IVM)
Schema matching tool
Transformation pathway optimisation
XML transformation/integration tool
16. Outline The AutoMed toolkit
The BioMap integration
Automatic XML data transformation/integration
17. Integration Outline Wrapping of sources
Translation of source and global schemas into the XML schema type used within AutoMed
Domain expert provides mappings between sources & global schema
Automatic schema transformation/integration algorithm
18. Relational – To - XMLDSS
19. Integration Outline Wrapping of sources
Translation of source and global schemas into the XML schema type used within AutoMed
Domain expert provides mappings between sources & global schema
Automatic schema transformation/integration algorithm
20. Outline The AutoMed toolkit
The BioMap integration
Automatic XML data transformation/integration
21. Outline Semantic Heterogeneity
Schema Matching
Ontologies
Structural Heterogeneity
XML schema type in AutoMed
Schema transformation
Schema integration
22. Semantic Heterogeneity Problem definition
Schema Matching
Data mining
Neural networks
Machine learning (LSD)
Ontologies (RDFS/OWL)
23. Schema Matching (1/2) Types:
1-1, 1-n, n-1, n-m
Subset, superset, equivalence
Use schema matching output to create the intermediate schemas used by the schema restructuring / schema integration algorithms
24. Schema Matching (2/2) Necessary transformations:
add attributes day, month, year in S
delete attribute dob from S
The reverse transformation pathway describes a n-1 match
25. Structural Heterogeneity Problem: Same information can be represented in many different ways
Ancestor – descendant ?? different branches
Elements & attributes not clearly distinguished in XML model
Ordering policy
26. Aims XML-specific solution:
Insert-remove-rename operations on elements, attributes, edges
Efficient ‘move’ (node/subtree) operation
Element-to-attribute, attribute-to-element transformations
Avoid loss of data due to structural incompatibilities
Automation
27. XML DataSource Schema (1/2) Basic characteristics:
Structure-only representation
XML format ? ease of traversal & manipulation
Automatically derived from an XML file
XMLDSS from other schema types (DTD, XML Schema)
28. XML DataSource Schema (2/2)
29. Schema Transformation Target schema T given
Source schema S is transformed to match the structure of T
30. Algorithm Growing phase: traverse the target schema and issue an add/extend transformation for every construct that does not exist in the source schema.
Shrinking phase: traverse the source schema and issue an delete/contract transformation for every construct that does not exist in the target schema.
Completeness of algorithm
31. Transformation Types AutoMed primitive transformations:
add/extend
delete/contract
rename
Schema level:
Insert, remove or rename schema constructs
Move element/subtree
Element ?? attribute
32. Example 1 Insert element C
ext(<C>,Void,Any)
ext(<A,C>, Void,Any)
ext(<C,B>, Void,Any)
del(<A,B>,q)
Remove element C
add(<A,B>,q)
con(<C>, Void,Any)
con(<C,B>, Void,Any)
con(<A,C>, Void,Any) Let us know see some examples of transformations at the schema level.
The figure shows the insertion of element C in schema S1 or, in the other direction, the removal of element C from schema S2.
One thing to remember when inserting or removing constructs in our graph environment is that, because all edges are defined by referencing the nodes they connect, one must be careful to leave the graph in a consistent state.
As an example, consider the removal of element C from schema S1. One cannot remove element C before removing the incoming and outgoing edges as this would leave the graph in an inconsistent state.Let us know see some examples of transformations at the schema level.
The figure shows the insertion of element C in schema S1 or, in the other direction, the removal of element C from schema S2.
One thing to remember when inserting or removing constructs in our graph environment is that, because all edges are defined by referencing the nodes they connect, one must be careful to leave the graph in a consistent state.
As an example, consider the removal of element C from schema S1. One cannot remove element C before removing the incoming and outgoing edges as this would leave the graph in an inconsistent state.
33. Example 2 Insert/remove edge: move operation This slide illustrates the basic move operation.
In order to move element C from being the child of element B to being the child of element A, we simply insert an edge from element A to element C, then we remove the edge from element B to element C.
Since it is possible to describe both the inserted and the removed edge using the rest of the schema constructs, the transformations are add and delete respectively.This slide illustrates the basic move operation.
In order to move element C from being the child of element B to being the child of element A, we simply insert an edge from element A to element C, then we remove the edge from element B to element C.
Since it is possible to describe both the inserted and the removed edge using the rest of the schema constructs, the transformations are add and delete respectively.
34. Example 3 Move:
add(<root,B>,q3)
add(<B,A>,
[{b,a}|{a,b}?<A,B>])
delete(<A,B>)
[{a,b}|{b,a}?<B,A>])
Complete:
add(<B’>, <B>++q1)
add(<A,B’>, <A,B>++q2)
delete(<A,B>, <A,B’>)
delete(<B>, <B’>)
rename(<B’>, <B>) This slide shows another move operation example. This time the transformations are not add and delete, but extend and contract. Furthermore, if we want to avoid loss of data, the algorithm has to create synthetic structure.
The reason is that in the data source of S1 there may be instances of A that do not have instances of B as children. Therefore, when migrating the data of data source S1 to the data source S2, if the algorithm does not create synthetic structure, these instances of A will be lost.
In this example, the pathway consists of two transformations, a composite transformation named complete, which extends schema S1 with element B then extends schema S1 with the edge from element root to element B. The second transformation is an extend transformation that inserts the edge from element B to element A.This slide shows another move operation example. This time the transformations are not add and delete, but extend and contract. Furthermore, if we want to avoid loss of data, the algorithm has to create synthetic structure.
The reason is that in the data source of S1 there may be instances of A that do not have instances of B as children. Therefore, when migrating the data of data source S1 to the data source S2, if the algorithm does not create synthetic structure, these instances of A will be lost.
In this example, the pathway consists of two transformations, a composite transformation named complete, which extends schema S1 with element B then extends schema S1 with the edge from element root to element B. The second transformation is an extend transformation that inserts the edge from element B to element A.
35. Example 1 - revisited Actually, this can also be treated with an add/delete transformation
36. Example 4 Element-to-attribute transformation
insert(<A,A:B>,q)
remove(<A,B>,q)
remove(<B,PCDATA>,q)
remove(<B>,q)
Attribute-to-elementtransformation
insert(<B>,q)
insert(<A,B>,q)
insert(<B,PCDATA>,q)
remove(<A,A:B>,q) This slide shows illustrates the element-to-attribute and attribute-to-element transformations.
Element B in schema S1 is transformed into attribute B of element A in schema S2. The algorithm first creates the attribute name from the element name. Then the algorithm adds the attribute’s extent, since it is possible to describe it using element B, the PCDATA node and the edge from element B to the PCDATA node.
In order to transform attribute B into an element, the algorithm first creates the element name using the attribute name. Then it inserts the edge from element A to element B and the edge from element B to the PCDATA node.This slide shows illustrates the element-to-attribute and attribute-to-element transformations.
Element B in schema S1 is transformed into attribute B of element A in schema S2. The algorithm first creates the attribute name from the element name. Then the algorithm adds the attribute’s extent, since it is possible to describe it using element B, the PCDATA node and the edge from element B to the PCDATA node.
In order to transform attribute B into an element, the algorithm first creates the element name using the attribute name. Then it inserts the edge from element A to element B and the edge from element B to the PCDATA node.
37. Schema Integration Augment with missing constructs
Remove redundant constructs
38. Materialisation Strategy:
Materialise root and its attributes
Consider all edges (ep,ec) in a depth-first way
Materialise ec and its attributes
39. Conclusions XML specific transformation & integration algorithms:
element??attribute transformations
move operation
No loss of data by synthetically creating missing structure
Automation – if sources have been previously semantically reconciliated
40. Future Work Ontologies instead of schema matching
XMLDSS
Constraints
Support for XML databases
XQuery capability for XML wrapper