Query Processing in the AquaLogic Data Services Platform

Query Processing in the AquaLogic Data Services Platform Vinayak Borkar, Michael Carey, Dmitry Lychagin, Till Westmann, Daniel Engovatov, Nicola Onose BEA Systemswww.bea.com

Data Is Everywhere Today • Relational databases made things too easy • Departmental vs. inter-galactic centralized databases • Databases come in many flavors • Relational: Oracle, DB2(s), SQL Server, MySQL, … • Hangers-on: IMS, IDMS, VSAM, … • Not all data is SQL-accessible • Packaged applications: SAP, PeopleSoft, Siebel, Oracle, SalesForce, … • Custom “homegrown” applications • Files of various shapes and sizes (XML, non-XML) • And the list goes on…

Painful to Develop Applications • No one “single view of X” for any X • What data do I have about X? • How do I stitch together the info I have? • What else is X related to? • No uniformity in source model or language • Data about X is stored in many different formats • Accessing or updating X involves many different APIs • Manual coding of “distributed query plans” • No reuse of artifacts • Different access criteria or returned data  different access plans • No model to help organize or find the artifacts anyway

Agenda Why data services? Overview and Example The Query Processor Work in progress at BEA Summary and Q&A

WSDL JAVA API Overview REPORTING JAVA/J2EE WEB SERVICE JDBC/SQL Client API Data Processing Engine Developer Tooling Connectivity Web Services Relational Files Java Functions Business partners, Adapter, .Net Tables, views stored procedures, SQL J2EE Excel LDAP XML, Flat Files Custom Access JCA JMS

Customer Info Order Info Credit Card Info Rating Info CUSTOMER, ORDER CREDIT_CARD getRating(…) Example: Customer Profile Data Service

Data Service – Design View

Data Service – “Get All” Read Method (::pragma function ... kind="read" ...::) declare function tns:getProfile() as element(ns0:PROFILE)* { for $CUSTOMER in db1:CUSTOMER() return <tns:PROFILE> <CID>{ fn:data($CUSTOMER/CID) }</CID> <LAST_NAME>{ fn:data($CUSTOMER/LAST_NAME) }</LAST_NAME> <ORDERS>{ db1:getORDER($CUSTOMER) }</ORDERS> <CREDIT_CARDS>{ db2:CREDIT_CARD()[CID eq $CUSTOMER/CID] }</CREDIT_CARDS> <RATING>{ fn:data(ws1:getRating( <ns5:getRating> <ns5:lName>{ data($CUSTOMER/LAST_NAME) }</ns5:lName> <ns5:ssn>{ data($CUSTOMER/SSN) }</ns5:ssn> </ns5:getRating> ) }</RATING> </tns:PROFILE> };

Data Service – Read & Navigate Methods (::pragma function ... kind="read" ...::) declare function tns:getProfileByID($id as xs:string) as element(ns0:PROFILE)* { tns:getProfile()[CID eq $id] }; ... (::pragma function ... kind="navigate" ...::) declare function tns:getCOMPLAINTs($arg as element(ns0:PROFILE)) as element(ns8:COMPLAINT)* { db3:COMPLAINT()[CID eq $arg/CID] }; ...

Query Processor Overview

Efficient processing • Avoid unnecessary work • Function inlining (view unfolding) • Push work to the sources • Queryable sources can do some of our work

Optimization: Function Inlining Example: This fragment let $x := <CUSTOMER> <LAST_NAME>{$name}</LAST_NAME> <ORDERS>…</ORDERS> </CUSTOMER> return fn:data($x/LAST_NAME) can be replaced by $name But we need to: maintain structural type information for compilation extend “preserve” mode for runtime

Pushdown: Overview • SQL Translator tries to “swallow” as much as possible. • Make translation easy • Remove unnecessary functions (by inlining) • Translation to joins and grouping • Split sorting and grouping • Make sources do as much as possible • DBMS specific code • Maximize Pushdown • Inverse functions

Pushdown: Preparation • Translation to joins and grouping • Split sorting and grouping for $CUSTOMER in db1:CUSTOMER() return <tns:PROFILE> <CID>{ fn:data($CUSTOMER/CID) }</CID> … <ORDERS>{ db1:getORDER($CUSTOMER) }</ORDERS> <CREDIT_CARDS>{ db2:CREDIT_CARD()[CID eq $CUSTOMER/CID] }</CREDIT_CARDS> … </tns:PROFILE>

Pushdown: Inverse Functions I Example fragment of tns:getProfile: <tns:PROFILE> <CID>{fn:data($CUSTOMER/CID)}</CID> <LAST_NAME>{ fn:data($CUSTOMER/LAST_NAME) }</LAST_NAME> <SINCE>{int2date($CUSTOMER/SINCE)}</SINCE> ... <tns:PROFILE> used in this query for $c in tns:getProfile() where $c/SINCE gt $start return $c yields (after inlining) for $c1 in ns3:CUSTOMER() where int2date($c1/SINCE) gt $start return <tns:PROFILE> ... </tns:PROFILE>

Pushdown: Inverse Functions II Register • Inverse functiondate2intforint2date • Transformation rule (gt, int2date) gt-intfromdatewith declare function gt-intfromdate($x1 as xs:dateTime, $x2 as xs:dateTime) as xs:boolean?{ date2int($x1) gt date2int($x2) }; Now we can rewrite the query into for $c1 in ns3:CUSTOMER() where $c1/SINCEgt ns1:date2int($start) return <tns:PROFILE> ... </tns:PROFILE> can be pushed as SELECT * FROM "CUSTOMER" t1 WHERE (t1."SINCE" > ?)

Optimization: PP-k JoinParameter Passing in chunks of k • Prerequisites • Distributed join • Right side is a relational source • Idea: relational source can partition its content • Steps • Read k items from the left into L • Select all items from the right that match any of the items in L into R • Join L and R in the middleware • Repeat until the left is exhausted • Benefit: excellent trade-off between • Memory footprint in the middleware • Roundtrip overhead imposed by the data source • k = 20

Example: “Get All” Read Method Revisited (::pragma function ... kind="read" ...::) declare function tns:getProfile() as element(ns0:PROFILE)* { for $CUSTOMER in db1:CUSTOMER() return <tns:PROFILE> <CID>{ fn:data($CUSTOMER/CID) }</CID> <LAST_NAME>{ fn:data($CUSTOMER/LAST_NAME) }</LAST_NAME> <ORDERS>{ db1:getORDER($CUSTOMER) }</ORDERS> <CREDIT_CARDS>{ db2:CREDIT_CARD()[CID eq $CUSTOMER/CID] }</CREDIT_CARDS> <RATING>{ fn:data(ws1:getRating( <ns5:getRating> <ns5:lName>{ data($CUSTOMER/LAST_NAME) }</ns5:lName> <ns5:ssn>{ data($CUSTOMER/SSN) }</ns5:ssn> </ns5:getRating> ) }</RATING> </tns:PROFILE> };

Evaluation Plan, Example 1 (getProfile)

Evaluation Plan, Example 2 (query getProfile)

Some ALDSP Work in Progress • Native JDBC/SQL92 support (available on Sep 15) • Bilingual engine for efficient reporting/BI tool access • Support for compensating transactions • Extend update facility to support safe non-XA updates (sagas) • XQuery Update support (as well as XQueryP) • Goal: no Java coding for many Web service use cases

Summary • Covered here • Why Data Services? • How are Data Services used? • Some techniques for efficient evaluation • In the paper • XQuery extensions • Runtime architecture • Caches • Updates • Security

Query Processing in the AquaLogic Data Services Platform