290 likes | 494 Views
Dynamic Database Integration in a JDBC Driver. Terrence Mason and Dr. Ramon Lawrence Iowa Database and Emerging Application Laboratory University of Iowa. 7th International Conference on Enterprise Information Systems ICEIS 2005 Miami, Florida. Presentation.
E N D
Dynamic Database Integration in a JDBC Driver Terrence Mason and Dr. Ramon Lawrence Iowa Database and Emerging Application Laboratory University of Iowa 7th International Conference on Enterprise Information Systems ICEIS 2005 Miami, Florida
Presentation • Discuss the contributions of JDBC Driver • Review the Architecture • Step through an example integration and query (Partitioned TPC-H* dataset) • Review the experimental results • Demonstrate efficient Database Integration *http://www.tpc.org/tpch/default.asp
Contributions to Database Integration • Standard API for Integration (JDBC) • Automatic generation of a globalview of integrated data sources • Annotation done locally • Common Vocabulary (National Cancer Institute-EVS) • Scalable to build a global schema • Simple Conceptual Query Language • Automatic JoinDetermination for queries • Allows evolution of data sources • Detects inconsistentdata across sources
Unity JDBC Driver Architecture JavaApplication Semantic Query Results Unity JDBC Driver Embedded Database Engine JDBC JDBC JDBC SQL DB1 DB2 DBn . . . . . . . . . . . .
Extending Standard JDBC API for Integration • Standard Java Interfaces for Single Database JDBC Connections extended to Multiple Databases • Connection • Driver Manager • Statement • Result Set
Java Code for JDBC Integration import java.sql.*; public class JDBCApplication { public static void main(String[] args) { { String url = “jdbc:unity://sources.xml"; (1) Connection con; (2) // Load UnityDriver class try { { Class.forName(``unity.jdbc.UnityDriver");} } (3) catch (java.lang.ClassNotFoundException e) { System.exit(1); } (4) try { //Initiate connection (5) con = DriverManager.getConnection(url); (6) Statement stmt = con.createStatement(); (7) ResultSet rst = stmt.executeQuery(`SELECT Part.Name, (8) LineItem.Quantity, Customer.Name (9) WHERE Customer.Name='Customer_25’ ”);} (10) System.out.println(``Part , Quantity, Customer"); (11) while (rst.next()) (12) { System.out.println(rst.getString(``Part.Name") (13) +”,”+rst.getString(``LineItem.Quantity") (14) +”,”+rst.getString(``Customer.Name") ); (15) } con.close(); (16) } (17) catch (SQLException ex) { System.exit(1); } (18) } }
XML File to Reference Data Sources <SOURCES> <DATABASE> <URL>jdbc:microsoft:sqlserver://IDEALAB5.cs.uiowa.edu:1433;DatabaseName=TPC; User=terry;Password=xxxxx</URL> <DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER> <XSPEC>xspec/Order.xml</XSPEC> </DATABASE> <DATABASE> <URL>jdbc:microsoft:sqlserver://IDEALAB3.cs.uiowa.edu:1433;DatabaseName=TPC; User=terry;Password=yyyyyy</URL> <DRIVER>com.microsoft.jdbc.sqlserver.SQLServerDriver</DRIVER> <XSPEC>xspec/Part.xml</XSPEC> </DATABASE> </SOURCES>
Order.xml file (XSpec) • Schema Information • Table • Fields • Primary key • Foreign key • Join • Order Database • Annotation-Semantic Names • Scope of Keys (Global joins) -<XSPEC> <databaseName>Order</databaseName> <databaseProductVersion>Microsoft SQL Server 2000 </databaseProductVersion> -<TABLE> <semanticTableName>Customer</semanticTableName> <tableName>CUSTOMER</tableName> - <FIELD> <semanticFieldName>Customer.Id</semanticFieldName> <fieldName>C_CUSTKEY</fieldName> <dataTypeName>int</dataTypeName> </FIELD> - <FIELD> <semanticFieldName>Customer.Name</semanticFieldName> <fieldName>C_NAME</fieldName> <dataTypeName>varchar</dataTypeName> <fieldSize>25</fieldSize> </FIELD> - <FIELD> <semanticFieldName>Customer.Nation.Id</semanticFieldName> <fieldName>C_NATIONKEY</fieldName> <dataTypeName>int</dataTypeName> </FIELD> - <PRIMARYKEY> <keyScope>4</keyScope> <keyScopeName>Organization</keyScopeName> - <FIELDS> <fieldName>C_CUSTKEY</fieldName> </FIELDS> </PRIMARYKEY> - <FOREIGNKEY> <keyScope>4</keyScope> <keyScopeName>Organization</keyScopeName> - <FIELDS> <fieldName>C_NATIONKEY</fieldName> </FIELDS> <toTableName>NATION</toTableName> </FOREIGNKEY> - <JOIN> <joinName>CUSTOMER->NATION</joinName> <fromKeyName>FK__CUSTOMER__C_NATI__7A672E12</fromKeyName> <fromTableName>CUSTOMER</fromTableName> <toKeyName>PK__NATION__6E01572D</toKeyName> <toTableName>NATION</toTableName> <joinType>3</joinType> </JOIN> • XML Document created Semi-automatically • Schema Information - Extracted Automatically from Database • Annotation and Scopes – Semi-automatically
Build Global Schema On Local Database Annotations Global Schema Part.Id, Part.Name, Part.Manufacturer Supplier.Id, Supplier.Name, Supplier.Nation.Id Order.Id, Order.Customer.Id, Order.Date LineItem.Linenumber, LineItem.Order.Id LineItem.Quantity, LineItem.Part.Id, LineItem.Supplier.Id, Customer.Id, Customer.Name, Customer.Nation.Id Nation.Id, Nation.Name, Nation.Region.Id Region.Id, Region.Name Part Database part(p_partkey, p_name, p_mfgr) supplier(s_suppkey, s_name, s_nationkey) partsupp(ps_partkey,ps_suppkey) nation(n_nationkey, n_name, n_regionkey) region(r_regionkey, r_name) Order Database customer(c_custkey, c_name, c_nationkey) orders(o_orderkey,o_custkey,o_orderdate) lineitem(l_orderkey,l_partkey,l_suppkey,l_linenum,l_qty) nation(n_nationkey, n_name,n_regionkey) region(r_regionkey, r_name)
Query Language on concepts in Global Schema No FROM clause Tables not specified Selection conditions on concepts Order by Attribute Only SQLGlobal Schema Global Schema Part.Id, Part.Name, Part.Manufacturer Supplier.Id, Supplier.Name, Supplier.Nation.Id Order.Id, Order.Customer.Id, Order.Date LineItem.Linenumber, LineItem.Order.Id LineItem.Quantity, LineItem.Part.Id, LineItem.Supplier.Id, Customer.Id, Customer.Name, Customer.Nation.Id Nation.Id, Nation.Name, Nation.Region.Id Region.Id, Region.Name Query: SELECT Part.Name, LineItem.Quantity, Customer.Name WHERE Customer.Name = 'Customer#000000025'
QueryProcessing Steps • Parse Semantic Query • Validate concepts • Create parse tree • Map concepts to fields in local databases • Determine joins to relate attributes in each local database • Build Execution Tree (Relational Algebra) • Execute a sub-query to each local database • Find global join or union to relate sub-queries • Combine sub-queries into single result set
Conceptual Query and Parse Tree Conceptual Query: SELECT Part.Name, LineItem.Quantity, Customer.Name WHERE Customer.Name = 'Customer#000000025' Parse Tree: SELECT Identifier: Part.Name Identifier: LineItem.Quantity Identifier: Customer.Name WHERE Comparison_Op: = Identifier: Customer.Name String: 'Customer#000000025'
Join Graph Construction • Graph represents joins for each local database • Edges directed as N:1joins • Automatically extracted into XSpec or added to the XSpec. • Used to calculate joins for each sub-query
Database Join Graphs Order Database Part Database Line Item Part Supp Order Part Supplier Nation Customer Nation Region Region
Map the Concepts in query to Relations Part.Name, LineItem.Quantity, Customer.Name Order Database Part Database Line Item Part Supp Order Part Supplier Nation Customer Nation Region Region
Determine Local Joins Steiner Tree Approximation Algorithm Order Database Part Database Line Item Order Part Customer Global Join LineItem.Part.Id is foreign key to Part.Id Semantic Query: SELECT Part.Name, LineItem.Quantity, Customer.Name Where Customer.Name = 'Customer#000000025'
Build Execution TreeRelational Algebra • Projection • Concepts in SELECT portion of conceptual query • Sub-query projections of required fields (global joins) • Selection • WHERE conditions of conceptual query • Joins • Determined from Join Graphs • Global joins identified by key scopes
Sub-queries Sent to Each Local Database SQL through JDBC • Part Database: • SELECT P.P_NAME, P.P_PARTKEY FROM PART AS P • Order Database: • SELECT L.L_QUANTITY, C.C_NAME, L.L_PARTKEY FROM LINEITEM AS L, CUSTOMER AS C, ORDERS AS O WHERE C.C_NAME = 'Customer#000000025' AND O.O_CUSTKEY = C.C_CUSTKEY AND L.L_ORDERKEY = O.O_ORDERKEY • Local joins determined from join graphs • Selection Condition • Elements added to queries in order for the global join to be executed in Unity Driver.
Operator Execution Tree Idealab1 Client (Unity Driver) Unity Embedded Database Engine Idealab5 Database Server Idealab3 Database Server
Experimental Results • Dynamic integration is efficient and scalable • Minimal overhead • Multi-source query processing • Competitive with single source execution • Possible to execute queries on a global schema
Schema Integration Results Multiple Copies of TPC-H (Seconds) • Integration of schemas occurs in linear time based on number of schemas integrated. • Integration and Connection executes only once at start up. Not for each query.
Query Small Result Size(76 Tuples) * Only 76 tuples transported over network for single sub-query * Separate requires entire Part table imported to Unity for join
Query Large Result Size(6,000,215 tuples) * Distributed execution of the queries on multiple computers executed faster than a single database server due to parallelism for this particular query.
Conclusions • Integration possible in a JDBC Driver • Local Annotation permits scalable integrations • Minimal Overhead to Process Queries • Query Multiple Database on a Global View • No need to specify joins • No requirement to know underlying schemas
Future Works • AutoJoin – Scalable inference engine for join determination • Improve global query inference • Sophisticated Global Query Optimizer • Extend to support Federated Database Queries • No global schema • Fully Specified Queries
Queries to Test Unity Performance(Data Labels on Charts) • TPC-H - Conceptual query executed through Unity driver against a single source TPC-H database. • JDBC TPC-H - SQL query equivalent to conceptual query executed directly through SQL Server JDBC driver on a single source TPC-H database. • Partitioned on One Computer - Conceptual query executed on TPC-H data set virtually partitioned into the Part and Order databases