440 likes | 549 Views
Integrating XQuery and Relational Database Systems. Xml and relational databases. Xml Advantages: simple unicode based platform independent syntax many parsers is able to represent structured data, semi-structured data and markup data. Structured data. <customer> <ID>C1</ID>
E N D
Xml and relational databases Xml Advantages: • simple • unicode based • platform independent syntax • many parsers • is able to represent structured data, semi-structured data and markup data.
Structured data <customer> <ID>C1</ID> <FirstName>Janine</FirstName> <LastName>Smith</LastName> <Address> <Street>1 Broadway Way</Street> <City>Seattle</City> <Zip>WA 98000</Zip> </Address> <Order> <ID>01</ID> <OrederDate>2003-01-21</OrederDate> <Amount>7</Amount> <ProductID>P3</ProductID> </Order> <Order> <ID>02</ID> <OrederDate>2003-06-24</OrederDate> <Amount>3</Amount> <ProductID>P3</ProductID> </Order> </customer>
Semi-structured data <PatientRecord pid="P1"> <FirstName>Janine</FirstName> <LastName>Smith</LastName> <Address> <Street>1 Broadway Way</Street> <City>Seattle</City> <Zip>WA 98000</Zip> </Address> <visit date="2002-01-05"> Janine came in with a <symptom>rash</symptom>. We idetified a <diagnosis>antibiotics allergy</diagnosis> and <remedy>Changed her cold prescription</remedy>. </visit> </PatientRecord> <PatientRecord> <pid>P2</pid> <Name>Nils Soerensen</Name> <Address>23 NE 40th Street, New York</Address> </PatientRecord>
Markup data <visit date="2002-01-05"> Janine came in with a <symptom>rash</symptom>. We idetified a <diagnosis>antibiotics allergy </diagnosis> and <remedy>Changed her cold prescription</remedy>. </visit>
Xml and realtional databases • Relational DBs manage structured data and thay are being used by 80% of the market. • Most relational DBs developped capabilities that fit XML structured data concept. • Recently there have been attempts to deal with data that doesn’t fit that concept.
Overview • Relational Storage of XML: The Xml type • Integrating Xquery and SQL: Querying XML datatypes • Top-level XQuery
Relational Storage of XML: The Xml type • All LOB-based storage mechanisms provide a built in XML datatype. The SQL name for that built in datatype is XML. • There are different logical models and physical representations for the XML datatype.
Logical Models for the XML Datatype • The standard doesn’t define the impelmentation of the XML datatype, only the requirement it should satisfy:Representing any element content, multiple top-level element nodes and top-level text nodes. • It is possible to cast the XML datatype to string (serializing) and vice versa (parse) using SQL.
Physical Models for the XML Datatype • All the storage mechanisms assume that the data has been verified to be a well-formed instance of the XML datatype. • XML storage fidelities: • String-level fidelity: code-point for code-point. • Infoset-level fidelity: the stored XML shares the same inforamtion set as the original XML document. • Relational fidelity level: preserves information from relational point of view and disgards XML-specific properties, such as document order.
Character LOB (CLOB) With CLOB, the XML is stored in a character representation. • The may preserve the original XML data exactly (string-level fidelity). • The data may have been transformed by changing the encoding of the content (infoset-level fidelity). This method is not efficient: • It requires extensive indexing or requires every query to parse the data before executing the query. • Hard to update on a node level.
Binary LOB (BLOB) • Binary format provides more efficient index processing and compression, such as pre processed xml in the form of a DOM tree. • BLOB could provide string-level fidelity, but it usually provides only infoset level. • A BLOB provides an additional level of abstracion over CLOBs. • Node-level updates are costly since the BLOB support in relational systems is not designed for logical updates.
User-Defined Type Relational database systems allow user to add user defined physical presentation. • The format of the XML is defined by the user (binary vs text etc..). • There are two major approaches for supporting user-defined types: • Deep integration approach: provides type integration into the actual query processor. • Virtual Machine approach: provides an external co-engine to help processing XML querries.
Relational table mapping XML data is being mapped to relational data and indices. It often only provides relational fidelity. • Node tables: represents every node in the XML instance as a row in the table. • Table shredding: Can be used if additioanal structured information is available in the form of a schema.
Node table example XML (untyped) <doc> <customer cid="ALFKI"> <Oreder oid="01"/> </customer> <customer cid="BONDO"> </customer> </doc>
Shredded table example XML with a schema that describes Customer and Order <Customer cid="ALFKI" customername="Alfred's Futterkiste"> <Oreder oid="01" orderdate="2002-0-10"/> </Customer> <Customer cid="BONDO" customername="Bon Odessa"/>
Typing an XML Datatype • Typing xml values according to xml schemas. • In order to provide type information from an xml schema, the type relevant information must be managed in the metadata componant of the database. • Relational databases enable mapping of xml namespace URIs to SQL idetifiers. • example:CREATE XML SCHEMA POSchema NAMESPACE N'<schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespase=http://www.example.com/po-schema> ... </schema>'
Typing an XML Datatype • XML datatype instances are validated according to the schemata in the database repository. • There are three modes of validation: • Skip validation: no validation. • Lax validation: validation of a subtree only if there is an applicable schema component. • Strict validation: requires that all data conform to the schema. • isvalid(XML instance, SchemaComponent,validation mode) boolean
Typing an XML Datatype • Static association vs. dynamic association • CREATE TABLE Customers (CustomerID int PRIMARY KEY,CustomerName nvarchar(100),PurchaseOrders XML TYPED AS POSchema,OrderForm XML,OrderFormType nvarchar(1000)CHECK isvalid(OrderForm,OrderFormType,’strict’))
Integrating XQuery and SQL: Querying XML datatypes • SQL can retrieve the XML-typed column, but it can’t query it. That is where XQuery comes in. XQuery allows us to query and transforn the XML data. • This section explains the integration of the two, so that we can invoke XQuery functionality from SQL, and provide information from the relational environment to the Xquery context.
XQuery Functionality in SQL • We must be able to transform an XML datatype instance to another XML datatype. • We must have a way to extract information from an XML instance that fits into the SQL type system (scalar SQL types/XML datatype). • Testing the XML structure for existence • value(XML, XQueryString, SQLType) SQLType • exists(XML, XQueryString) boolen • query(XML, XQueryString) XML
XQuery Functionality in SQL • Compiling the XQuery Expressions • The XQuery expressions could be given dynamically or as constant strings. If the XQuery expression is given as a constant string, then relational systems can compile the XQuery at the same time as the SQL statement.
Augmenting the XQuery Static Context • The XQuery default collation is implicitly set to the relational collation of the XML datatype. • The relational database system’s built in functions are added to the static function context, as are the SQL constructors for the built in types. • The SQL built in types are added to the static type context, as are the types of the schema components in case of a statically constrained XML datatype.
Providing Access to SQL Data inside XQuery • Sometimes, an XQuery expression needs to access data from the relational realm. • One way to access SQL variables in XQuery is by mapping the variables into the XQuery static variable context.Since there are more valid name characters in SQL than in XML, different encoding should take place. This may be done using the variable() built-in pseudo-function. • Accessing an SQL column may be done using the column() built-in pseudo-function.
Mapping SQL Types to XML Schema Types • In order to be able to import the relational values into the XQeury context, the SQL types must be mapped to XML Schema. Example: Relational Type: INTEGER maps to: <xs:simpleType name=“INTEGER”> <xs:restriction base=“xsd:int”/> <xs:simpleType> • Since many relational systems provide additional built-in SQL types, they provide mappings that describe the implementation types.
Mapping SQL Types to XML Schema Types • The user must only provide the mapping for implementation specific namespaces. • Problems: • Mapping of strings: one XML Schema type for each length. • There are certain SQL character type values that are invalid in XML (most of the low-range ASCII control characters).
Adding XQuery Function Libraries • Importing externally defined XQuery function libraries is similar to the XML Schema import model. • A relational system can store XQuery function libraries by extending the metadata to allow them to be stored according to their namespace URI or an SQL idetifier. Such libraries would then be loaded and stored and imported into the XQuery static context when reffered to.
CREATE XML FUNCTION NAMESPACE N'module "http://www.example.com.myfns" declare namespace myf="http://www.example.com.myfns" define function myf:in-King-Country-WA ($zip as xs:integer) as xs:boolean ($zip < 90210 and $zip >= 90110) define function myf:King-Country-WA-salestax($x as xs:decimal) as xs:decimal {$x * 0.88} SELECT CustomerNAme, query(PurchaseOrder, 'import module namespace myf="http://www.example.com.myfns" declare namespace po = "http://www.example.com.po-schema" for $p in /po:purchase-order, $d in $p/po:details let $net as xs:decimal := $d/@qty * $d/@price where myf:in-King-Country-WA($p/po:shipTo/op:zip) return <po-detail id="{$p/@id}" total="{$net + myf:King-Country-WA-salestax($net)}"/>') as po-detail-price FROM Customers
Physical mapping of XQuery • General discussion on the different mapping strategies of XQuery into physical execution plans. • We assume that the XQuery expressions are provided as constants; thus their compilation and SQL expressions’ can occur simultaniously.
Physical mapping of XQuery • Two major compiling and executing XQuery expressions methods: • Decoupled approach: works with a standalone XQuery engine that is added to the relational system.The XQuery expression is passed to the XQuery processor, the XQuery is processed and the result is returned to the SQL environment.
Physical mapping of XQuery • Intergrated approach: Maps the XQuery expressions into logical operator trees that are integrated with the logical operator tree of the SQL statement. Complications: • Xquery has a more complex, nested and sequence based data-model. • Types and their operations are not mapped one to one. Thus, relational systems must extend their expression services to deal with the above. • Preserving input order in execution.
Physical mapping of XQuery Intergrated approach • cost based optimizer will have to choose execution plans that preserve the XQuery semantics. • Optimizer will often choose a bottom-up evaluation strategy, while the naïve execution startegy of XQuery is described top down. That may cause dynamic errors that would have been avoided in the top-down strategy.
Example of the previous situation: for $i in //A where $i/@a castable as xs:integer return for $j in $i/Bwhere xs:integer($i/@a)>10 return $j Optimization: for $i in //A, $j in $i/Bwhere $i/@a castable as xs:integer and xs:integer($i/@a)>10 return $j
Issues of combining SQL, XML Datatype and XQuery • The need to know the two languages: SQL and XQuery. • XQuery users interested in querying across multiple documents need to use SQL to iterate over the collection of documents. • SQL users interested in querying into XML documents need to use XQuery.
Issues of combining SQL, XML Datatype and XQuery • Solutions, more problems and more solutions… • The second problem can be avoided by shredding, but this mapping approach often only provides relational fidelity and thus does not address order preservation and markup scenarios. Thus either the SQL model should be extended, or The SQL model will be subsummed under the XML and Xquery model.
Top-Level XQuery • Top-Level XQuery provides a way to query collections of XML documents and forests without the need to use SQL. • One advantage is the abillity to provide concurency and locking. • The mid-tier model can do the above, however, if there’s already an XML datatype, it seems natural to extend the programming model to provide top-level Xquery support: XQUERY {XQuery expression}.
XML Document (Fragmet) Collections • There are two ways to provide XML collections. • An XQuery function collection(). Used to refer to a column that is typed as XML, or the same function provided by the database. • Another approach is to extend the notion of a table to allow the creation of a table of the XML datatype instead of a rowtype.
CREATE TABLE PurchaseOrders OF TYPE XML TYPED AS POSchema INSERT INTO PurchaseOrders SELECT PurchaseOrders FROM Customers XQUERY-MODIFY { Insert <po:purchase-order xmlns:po=http://www.example.com/po-schema id=“2001”> <po:shipTo> <po:address>42 2nd Avenue</ po:address> <po:city>Bellevue</ po:city> <po:state>WA</ po:state> <po:zip>98006</ po:zip> <po:shipTo> </po:shipTo> <po:details qty=“4” price=“3.44”/> </po:purchase-order>at last into sql:collection(‘PurchaseOrders’) } XQUERY { declare namespace po = http://www.example.com/po-schemasql:collection(‘PurchaseOrders’)/po:purchase-order/po:details }
XML Views over Relational Data • There are two main mapping approaches from relational to XML: • Table-to-doc: Maps every table to an XML document, where the root’s name is the table’s, and every row is mapped to an element named “row”. • Table-to-forest: Maps a table into an XML element forest.
XML Views over Relational Data • The top-level XQuery environment can access these views, by doc(), collection() or other built in functions. • Any of these views can be represented as a single XML document by making the top-level element(s) become the children of a document node. • The result may not necessarily be well formed.
XML Views over Relational Data • Example of a possible built-in XQuery function provided by top-level XQuery: sql:table($table as xs:string[,$table_map_option as xs:string [,$null_map_option as xs:string] ]) as element* • Use: • Sql:table(‘Order’,’table-to-forest’,’xsinil’)
Conclusion and Issues • Overview of how to integrate relational database systems with Xquery • The XML datatype • Mixed approach of using XQuery and SQL • Top-level XQuery approach • Insight into the imapct of the actual physical data model of the XML datatype on the processing of Xquery in the context of relational systems.
Open Issues • Which of the physical mappings of the XML data support XML inside a relational content the best? • How should the query-processing model be extended? • Concurancy control