460 likes | 471 Views
XML-QL is a query language for extracting, transforming, and integrating data from large XML documents, with examples and a data model for XML.
E N D
- Bo Du, April 15, 2002 Bo Du
XML - QL A Query Language for XML
OUTLINE • Introduction • Examples in XML-QL • A Data Model for XML • Advanced Examples in XML-QL • Extensions and Open Issues • Summary Bo Du
Why do we need a query language • XML standard doesn't address: • Extraction : How will data be extracted from large XML documents? • Transformation : How will XML data be exchanged between user communities using different but related DTDs? • Integration : How will XML data from multiple XML sources be integrated? • Conversion of data between relational or OO to XML Bo Du
Useful References • http://www.w3.org/XML/Query • http://www.w3.org/TR/NOTE-xml-ql/ • http://www.ibiblio.org/xql/ • http://groups.yahoo.com/group/xml-dev/ • …… Bo Du
Is this X?L a standard? NO! It is a submission to the World Wide Web Consortium. It is intended for review and comment by W3C members and is subject to change!!! Bo Du
What does XML-QL do exactly? • Extraction - of data pieces from XML documents • Transformation - Map XML data between different DTDs • Integration/Combinationof XML data from different sources Bo Du
DataTransformation Bo Du
Data Integration Bo Du
Requirements for the XML Query Language • Selection and extraction • Preserve structure • Reduction • Restructuring • Join (more detail in next section) Bo Du
OUTLINE • Introduction • Examples in XML-QL • A Data Model for XML • Advanced Examples in XML-QL • Extensions and Open Issues • Summary Bo Du
Bib.xml <bib> <book year="1995"> <!-- A good introductory text --> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> <publisher><name> Addison-Wesley </name ></publisher> </book> <book year="1998"> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname> Darwen </lastname> </author> <publisher> <name> Addison-Wesley </name > </publisher> </book> </bib> Bo Du
Bib.dtd <!ELEMENTbook (author+, title, publisher)> <!ATTLIST book year CDATA> <!ELEMENTarticle (author+, title, year?, (shortversion|longversion))> <!ATTLISTarticle type CDATA> <!ELEMENTpublisher (name, address)> <!ELEMENT author (firstname?, lastname)> Bo Du
Basic Examples: Selection/Extraction Find all the names of the authors whose publisher is Addison-Wesley: WHERE <book> <publisher><name> Addison-Wesley </name></publisher> <title> $t</title> <author> $a</author> </book> IN"www.a.b.c/bib.xml" CONSTRUCT$a Bo Du
Basic Examples(contd.) The use of </> instead of </XXX>: WHERE <book> <publisher><name> Addison-Wesley</></> <title> $t</> <author> $a</> </>IN"www.a.b.c/bib.xml" CONSTRUCT$a Bo Du
Results of our first query The output is in XML form: <lastname> Date </lastname> <lastname>Darwen </lastname> <lastname> Date </lastname> Bo Du
Use the current XML to construct a new XML structure <bib> <book year=“1995> <title> An Introduction to DB Systems </title> <author> <lastname> Date </lastname></author> <publisher><name> Addison-Wesley</name> </publisher> </book> <book year=“1995> <title> Foundations for OR Databases </title> <author> <lastname> Date </lastname></author> <author> <lastname> Darwen </lastname></author> <publisher><name> Addison-Wesley</name> </publisher> </book> </bib> Bo Du
Construct new XML data (Query) WHERE <book> <publisher> <name> Addison-Wesley</></> <title> $t </> <author> $a </> </> IN"www.a.b.c/bib.xml" CONSTRUCT<result> <author> $a </> <title> $t </> </> Bo Du
Construct new XML data (Result) <result> <author> <lastname> Date</lastname> </author> <title>An Introduction to DB Systems</title> </result> <result> <author> <lastname>Date</lastname> </author> <title>Foundation for OR Databases</title> </result> <result> <author> <lastname>Darwen</lastname> </author> <title>Foundation for OR Databases</title> </result> Bo Du
Grouping with Nested Queries: Preserve structure WHERE <book> $p </book>IN"www.a.b.c/bib.xml”, <publisher> <name>Addison-Wesley</> </> IN$p, <title> $t </> IN $p CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN$p CONSTRUCT <author> $a </> </> Bo Du
Reduction of previous slide Where <book> <publisher> <name>Addition-wesley</> </> <title>$t </> Element_As $x <author> $a</> Element_As $y </> </> INwww.a.b.c/bib/xml Construct <result> $x $y </> Bo Du
Another way: WHERE <book> <publisher> <name>Addison-Wesley</> </> <title> $t </> </> CONTENT_AS$pIN"www.a.b.c/bib.xml” CONSTRUCT <result> <title> $t </> WHERE <author> $a </> IN$p CONSTRUCT <author> $a </> </> Bo Du
Result <result> <title> An Introduction to Database Systems </title> <author> <lastname> Date </lastname> </author> </result> <result> <title> Foundation for Object/Relational Databases: The Third Manifesto </title> <author> <lastname> Date </lastname> </author> <author> <lastname>Darwen </lastname> </author> </result> Bo Du
OUTLINE • Introduction • Examples in XML-QL • A Data Model for XML • Advanced Examples in XML-QL • Extensions and Open Issues • Summary Bo Du
XML Data model • XML Graph • Syntax for Data • Mapping of XML graphs into XML documents • Element identities an ID references • …… (talked in previous presentations) Bo Du
OUTLINE • Introduction • Examples in XML-QL • A Data Model for XML • Advanced Examples in XML-QL • Extensions and Open Issues • Summary Bo Du
Advanced examples in XML-QL • Tag Variables • Regular - path Expressions • Transforming XML data • Integrating from multiple XML sources • Functions definitions and DTDs • External functions • Ordered model - Sorting, Indexing Bo Du
Tag variables WHERE< $p > -- $p can be {article, book} <title> $t </> <year>1995 </> < $e ><lastname> Date </> </> </> IN "bib.xml", $e IN {author, editor} CONSTRUCT < $p > <title> $t </> < $e > Date </> </> All publications published in 1995 in which Date is either an author, or an editor Bo Du
Query result <book> <author>Date</author> <title>An Introduction to Database Systems</title> </book> <article> <author>Date</author> <title>The New Jersey Machine-Code Toolkit</title> </article> Bo Du
payroll.xml <Payroll> <person> <ssn>100-00-0001</ssn> <name>J. Doe</name> <salary>35000</salary> </person> <person> <ssn>100-00-0002</ssn> <name>M. Smith</name> <salary>73000</salary> </person> <person> <ssn>100-00-0003</ssn> <name>R. Johnson</name> <salary>1400000</salary> </person> <person> <ssn>100-00-0004</ssn> <name>P. Kent</name> <salary>33000</salary> </person> </Payroll> Bo Du
taxpayers.xml <IRS> <taxpayer> <ssn>100-00-0001</ssn> <income>35000</income> <taxes>7000</taxes> </taxpayer> <taxpayer> <ssn>100-00-0002</ssn> <income>55000</income> <taxes>3000</taxes> </taxpayer> <taxpayer> <ssn>100-00-0003</ssn> <income>1430000</income> <taxes>25000</taxes> </taxpayer> <taxpayer> <ssn>100-00-0005</ssn> <income>120000</income> <taxes>30000</taxes> </taxpayer> </IRS> Bo Du
Integrating data from multiple XML sources WHERE <person> <name></> ELEMENT_AS$n <ssn> $ssn</> </> IN”payroll.xml",-- take a look at payroll.XML <taxpayer> <ssn> $ssn </> <income></> ELEMENT_AS$i </> IN"taxpayers.xml”-- take a look at taxpayer.xml CONSTRUCT <result> $n $i </> Bo Du
Integration result <result> <income>35000</income> <name>J. Doe</name> </result> <result> <income>55000</income> <name>M.Smith</name> </result> <result> <income>1430000</income> <name>R. Johnson</name> </result> Bo Du
Functions definitions and DTDs function query() { CONSTRUCT <result> findDeclaredIncomes("taxpayers.xml","payroll.xml") </result> } function findDeclaredIncome($Taxpayers,$Employees) { WHERE <taxpayer><ssn>$s</> <income> $x </></> IN $Taxpayer, <employee><ssn> $s </> <name> $n </> </>IN$Employees CONSTRUCT<result><name> $n </><income> $x </> </> } Bo Du
Functions definitions and DTDs (cont) Restrictions by DTD’s: function findDeclaredIncome ( $Taxpayers:”www.irs.gov/tp.dtd”, $Employees:”www.employees.org/employeess.dtd” :“www.my.site.com/myresult.dtd” ) { WHERE …. CONSTRUCT …. } Bo Du
Embedding queries in data <result> <articles> WHERE <article> <title> $t </><year> $y </> </> IN “www.a.b.c/bib.xml”, $y > 1995 CONSTRUCT <title> $t </> </> <books> WHERE <book> <title> $t </> <year> $y </> </> IN “www.a.b.c/bib.xml”, $y>1995 CONSTRUCT <title> $t </> </> </> Bo Du
Indexes for element: • XML support element-order variables. • Example: <a[$i]> … </> <$x[$j]> … </> • here $i and $j are bind to an integer 0, 1, 2 … that represent the index in the local order of the edges. Bo Du
Indexes for element (graph) root book [0] book[1] ( 1 ) ( 8 ) (year=“1998”) ( 2 ) (year=“1995”) author[3] title[0] author[2] ( 14 ) title[0] publisher[1] author[2] publisher[1] ( 9 ) ( 10 ) ( 12 ) ( 6 ) ( 3 ) ( 4) Foundations for ... lastname[0] An introduction … lastname[0] name[0] lastname[0] ( 13 ) name[0] ( 11 ) ( 15 ) ( 5 ) Date Datwen ( 7 ) Addison-Wesley Addison-Wesley Date Bo Du
Indexes for element: (cont.) Example: retrieves all the persons whose lastname precedes the firstname: WHERE <person> $p </> IN“www.a.b.c/people.xml” <firstname [$k] > $x </>IN$p, <lastname[$j] > $y </>IN$p, $j < $k CONSTRUCT <person> $p </> Bo Du
ORDER-BY: (cont.) Reverse the order of all authors in a publication: WHERE <pub> $p </>IN“www.a.b.c/people.xml”, CONSTRUCT <pub> WHERE<author[$k]> $a </> IN$p ORDER-BY $k DESCENDING CONSTRUCT <author> $a </> WHERE < $e > $v </> IN$p $e != “author” CONSTRUCT <$e> $v </> </pub> Bo Du
OUTLINE • Introduction • Examples in XML-QL • A Data Model for XML • Advanced Examples in XML-QL • Extensions and Open Issues • Summary Bo Du
Extensions and open issues: • Entities • User-defined predicates • String regular expressions • Name spaces • Aggregates • XML syntax • Extensions to other XML-related standard Bo Du
OUTLINE • Introduction • Examples in XML-QL • A Data Model for XML • Advanced Examples in XML-QL • Extensions and Open Issues • Summary Bo Du
Summary/Conclusions • XML-QL is a declarative language which provides support for querying, constructing, transforming, and integrating XML data • XML-QL supports both ordered and unordered view on XML document • XML-QL is based on similar database research suggested model of Semi-structured data • XML-QL satisfy the absolute set of requirements from query language cited in XML Query Requirements of W3C Working Draft • XML-QL is good candidate to be the new XML standard query language Bo Du
End ???… Bo Du