210 likes | 317 Views
A JDBC driver supporting Data Source Integration. Jian Jia. Motivation. Manually integrate information sources is painful, because of Heterogeneous data source inconsistent / incomplete data structure information Platform dependency. A Solution. Unity
E N D
Motivation • Manually integrate information sources is painful, because of • Heterogeneous data source • inconsistent / incomplete data structure information • Platform dependency
A Solution • Unity • Automate integration process • Uses ODBC to access multiple data sources • X-Spec to capture semantic meaning of the data --- Standard dictionary • In C++ • On Windows platform
Goal of this Project • Unity JDBC Driver • Embed integration function of Unity into a JDBC driver • X-Spec as the dictionary • in Java – platform independent • Access multiple data sources through JDBC
User Queries User Queries Semantic Queries Results SemanticQueries Results Unity JDBC Driver Unity Integration Module Integration Module … … ODBC ODBC ODBC JDBC JDBC JDBC SQL SQL SQL SQL SQL SQL DB1 DB2 … DBn DB1 DB2 … DBn Migrating Unity from ODBC to JDBC
Basic classes of JDBC API DriverManager registers creates retrieves Driver Statement ResultSet provides provides Connection ResultSetMetaData
JDBC Driver Types • JDBC-ODBC Bridges plus ODBC drivers • Native-API partly-Java drivers • JDBC-Net pure Java drivers • Native-protocol pure Java drivers
Semantic Query • An example SELECT [Employee].id, [Department;Employee].name WHERE [Employee].age > 30 • All Fields/Tables that have the same semantic meaning should have same semantic name. • Semantic query refers a field by its semantic name. • There is no explicit relation specifications(from table, join , union) in the query. • X-Spec Document stores information about all semantic names and corresponding system names for every field/table. • No nested query. • Semantic Query should be parsed to create sub-query in standard form(SQL’92) for each data source.
Semantic Query Grammar ALL DISTINCT SELECT [Column] , FROM Tables WHERE Search Condition GROUP BY Columns ORDER BY [Column] DESC /ASC ,
Semantic Query Grammar Search Condition NOT Expression [NOT] LIKE “[%] String” Column IS [NOT] NULL (Expression) = < > <> Expression Expression OR AND
Parsing X-Spec ……. X-Spec Field 1 X-Spec Field 2 X-Spec Field k … X-Spec Table 1 X-spec Key 1 X-spec Key 2 X-spec Key j X-Spec Table 2 … X-spec Joins X-Spec Table 3 . . . … … … X-Spec Table n
Semantic query S_list F_list C_list GroupBy_list OrderBy_list PASS ONE PASS TWO Mapping semantic Name to System Name; Build sub-query Selected X-Spec Fields Selected Fields (Sys_Name) …….. Sub Query 1 Sub Query 2 Sub Query n Used for integration Parsing Semantic Query Query Translator
Sub-Query Generation • S-List(Selection-List) Only those semantic fields that are in the data source can be substituted by their system names, and added to corresponding sub-query selection list • C-List(Condition-List) An expression can only be added to sub-query condition list only when all semantic arguments are in the data source.
Sub-Query Generation Sub-S-Clause S-List Sub-From- Clause C-List Sub-Where- Clause OrderBy-List Sub-Order By- Clause
Inside JDBC Driver Semantic query ResultSet Selected Fields (Sys_Name) Query Translator ResultSet MetaData ResultSet- MetaData Sub Query 1 Sub Query 2 … Sub Query n Join Union ResultSet 1 ResultSet- MetaData ResultSet 2 Integration JDBC 1 JDBC 2 JDBC n ResultSet n ResultSet- MetaData DB1 DB2 … DBn
Integration Method • JOIN Merge Join by Global Keys MultiValue Field – Data inconsistent • UNION Simply append one ResultSet to another Do not need match keys
A Simple Example • Semantic Query (for two data sources) SELECT F1, F2, F3, F4, F5 WHERE C1 AND C2 Sub-query for DB2 Sub-query for DB1 SELECT f1a, f3a, f5a FROM tableA WHERE C1 SELECT f1b,f2b, f4b FROM tableB WHERE C2
Example Result • Join Result Assume Key_A and Key_B are two system names of the global KEY in two data sources DB 1 DB 2
Example Result • Union DB 1 DB 2
Future Work • Operations cross data sources • Complete Algorithms for Result Integration • Automated Updates on heterogeneous data Sources • Implement Group By, From in semantic query