1 / 70

A Type System for a Semistructured and XML Data Base Management System

A Type System for a Semistructured and XML Data Base Management System. Ph. D. Thesis Proposal Dario Colazzo. Thesis Goals. Formal developement and study of a type system for XML querying Implementation of a concrete type system for an XML data base management system: the Xtasy system.

jalila
Download Presentation

A Type System for a Semistructured and XML Data Base Management System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Type System for a Semistructured and XML Data Base Management System Ph. D. Thesis Proposal Dario Colazzo

  2. Thesis Goals • Formal developement and study of a type system for XML querying • Implementation of a concrete type system for an XML data base management system: the Xtasy system

  3. Presentation outline • Semistructured data and XML • Data models • Type languages: DTD, XML Schema • Querying XML data: Tequyla • Processing XML data: XDuce • Thesis goals

  4. Semistructured data • Irregular and instable structure • Self-describing representation • No separate schema information: few guarantees of reliability and efficiency of applications

  5. OEM graph addrbook person person name age addr name age email “Dario Colazzo” 30 “Pisa” first second 30 “sartia@xyz.com” “Carlo” “Sartiani”

  6. XML syntax <addrbook> <person> <name>Dario Colazzo</name> <addr>Pisa</addr> </person> <person> <name> <first>Carlo </first> <second>Sartiani</second> </name> <addr>Pisa</addr> <email>sartia@xyz.com</email> </person> </addrbook>

  7. Attributes and element reference <db> <state id="01"> <name>Italy</name> <code>IT</code> </state> ....... <city region=“Toscana” state-of="01"> <name>Italy</name> <code>PI</code> </city> </db>

  8. XML Query Data Model • Based on node labeled forest trees (set of documents) • Several kind of nodes: • element node • attribute node • value node • Identifier and reference attributes modeled as general attribute

  9. XML Tree addrbook element node attribute node person person value node name email addr name age addr age first second “Dario Colazzo” “Pisa” 30 “Pisa” “sartia@xyz.com” 30 “Carlo” “Sartiani”

  10. XML schema languages • Document Type Declarations: schemas as grammars for documents. Regular type expressions • XML Schemas: closer to traditional type languages

  11. DTD • Regular type expressions: • T | U union • T,U sequence • T* zero or more • T? zero or one • X=T[X] recursive definitions • coupled-tag element declarations • global definitions • only one base type: string (PCDATA) • no type reusing

  12. DTD, example zero or more <!DOCTYPE addrbook[ <!ELEMENT addrbook (person*) <!ELEMENT person (name, addr, tel?)> <!ELEMENT name #PCDATA> <!ELEMENT addr #PCDATA> <!ELEMENT tel #PCDATA> zero or one

  13. XML Schema • decoupled-tag: elements and types may be defined separately • local definitions • base types: intgers, string, decimal,... • type reusing: • type refining • type extension with subtyping

  14. XML Schema, example <xsd:complexType name="person"> <xsd:sequence> <xsd:element name="name" type="xsd:string" /> <xsd:element name="age" type="xsd:ageType"/> <\xsd:sequence> <\xsd:complexType> <xsd:complexType name="newPerson" base="typeOfPerson" derivedBy="extension"> <xsd:element name="car" type="xsd:string" /> <\xsd:complexType>

  15. Querying XML data • XML querying is based on the use of patterns to select portions of document • Untyped query languages: • XQL • XML-QL • Quilt • Typed: • Tequyla • XDuce (functional language) • Forthcoming W3C query language...?.. • probably  Quilt

  16. Tequyla • SQL-like query language • query free-nesting • typed: • query correctness • query typing • Currently: only non algorithmical definitions, and weak subtyping

  17. Tequyla queries • The body of a Tequila query is a from clause composed by XPath patterns • x=addressbook.xml; • bind to x the root element of addressbook.xml • y in x//person/addr • starting from the root (x) search for a person element at an arbitrary depth (//), then for an addr sub element (/), finally bind the node found to y

  18. A Tequyla query Q = from x=addressbook.xml; y in x//person/addr; z in x//person/name; where y="Pisa" select nome[z] XPath

  19. XDuce • Typed functional language • Regular expressions types • Type based pattern language

  20. XDuce schema • A schema is a set of type definitions E= { Addressbook = addrbook [(Name, Addr, Tel?) *] Name = name [String] Addr = addr[String] Tel = tel[String] }

  21. An XDuce funtion: telephone list • Consider T= (Name, Addr,Tel?) in fun mkTelList : T* --> (Name,Tel)* = name[n], addr[a], tel[t], rest:T* --> name[n],tel[t], mkTelList(rest) | name[n], addr[a], rest: T* --> mkTelList(rest) | () --> ()

  22. XDuce subtyping: language inclusion • XDuce provides a simple but rather powerful notion of subtyping based on inclusion between sets of values • Examples • Name, Addr <: Name, Addr,Tel? • Name, Addr,Tel <: Name, Addr,Tel? • XML Schema extension subtyping is not captured

  23. Xtasy type system

  24. Type language • As expressive as DTD and XML Schema • Base types • Attributes and id/idref types • Type refining and extension • Local type definitions • Unordered sequence types

  25. Schema extraction and schema inferring • For untyped data, a schema will be inferred according to the XML Schema style • For typed XML data, the schema will be converted in the internal schema representation • Type inference for query results

  26. Data conformity • An algorithm will be defined to check data conformity to a schema • The problem is EXPTIME-complete • Optimization techniques exist • Further ones has to be found to deal with unordered sequence types and id/idref types

  27. Query correctness • Only type correct queries will be executed • Type correctness is based on successful matching between the query structural requirements and the type of the data to be queried

  28. Correct queries, an example (1/2) Consider E= { Adrressbook = addrbook [Person*] Person = (Name, Addr, Tel?) Name = name [String] Addr = addr[String] Tel = tel[String] }

  29. Correct queries, an example (2/2) • A correct query: Q = from x=addressbook.xml; y in x//person/addr; z in x//person/name; where y="Pisa" select nome[z]

  30. Correctness & union types • Consider: Q’ = from x=addressbook.xml; y in x//person/addr; z in x//person/tel; where y="Pisa" select results[z] • Schould we consider this query correct?

  31. Correctness & union types: existential approach • The previous query is considered as correct • The user will be warned about optional elements required by patterns

  32. Total approach • The previous query is considered as not correct • Too severe discipline • A lot of queries with non empty results would be cut off

  33. Type equivalences • Several type equivalences laws will be considered • In particular: • (T | U) , S = (T , S) | (T , S) • Useful to simplify schema definitions

  34. Subtyping • A subtype relation E  E’ will be defined such that: • If a query Q is correct wrt E’ then it is also correct wrt E • Type extension will be supported: if E is an extension of E’ then E  E’

  35. Parametric polymorphism (1/3) • Used in some functional languages (e.g. ML and Haskel) to define generic functions, for example: funtion Sort (t :Type; L:List t; Ord:tX t Bool): List t begin ..... end. • It will allow us to define generic queries

  36. Parametric polymorphism (2/3) • Parametric types fits well in the description of irregular data structure • For example E(t)= {Adrressbook = addrbook [(Name, Addr, Tel?) *] Name = name [String] Addr = addr[t] Tel = tel[String]} • addr elements content can have, for example, a street and a city sub-element

  37. Parametric polymorphism (3/3) • A generic query: Q =  t: Type;  a : E(t) . from x= a ; y in x//person/addr; z in x//person/name; where z=“dario" select indirizzo[y] • More precise typing: the type Any* is different from t*

  38. Conclusions • The type system will provide: • union types • reference types • recursive types • subtyping • parametric polymorphism

  39. Avanzamento

  40. Presentation outline • Proposal • What has been done • Ongoing and future work

  41. Thesis Goals • Formal developement and study of a type system for XML querying • The query language is an abstract version of XQuery (W3C) • The type langueage is expressive enough to capture the essence of current standards

  42. Xquery type system • Only result analisis: XQuery type system is defined to determine and check at query-analysis time the output type of a query on documents conforming to an expected input type. • Query correctness is not defiend and checked (only some ideas).

  43. What has been done • We have: • formally defined the notion of query type correctness • defined a type system to statically check it and to perform result analisys; the rules define a terminating algorithm. • intruduced an alternative, wrt Xquery, approach to deal with recursive types

  44. Observations • Our type system also performs query analisys and, in this respect, presents some differences wrt XQuery approach • Till now, we have considered a type system feeaturing product, union and recursive types • We have discovered that these type mechnanism are sufficient enough to make the study interesting and (as we will see) rather subtle.

  45. Observations • discovered that for particular queries (fortunately not frequent ones) the type system is not able to exactly capture the semantical characterization of correctness • Introduced a further notion of correctness, path-covering, and provided rules to check this property

  46. Papers • A first defintion of the type system can be found in A Typed Text Retrieval Query Language for XML Documents , Journal of the American Society for Information Science and Technology (JASIS)Special Issue 2001 • In Types for Correctness of Queries over Semistructured Data, the system has been improved by a finer notion of query correctness and by the notion of path covering. The work will be submitted at WebDB2002 workshop

  47. Tequyla (or µXQuery) • SQL-like query language • query free-nesting • typed: • type conformance of data • query correctness • query typing (result unalysis)

  48. Tequyla queries • The body of a Tequila query is a from clause composed by XPath patterns • x=addressbook.xml; • bind to x the root element of addressbook.xml • y in x//person/addr • starting from the root (x) search for a person element at an arbitrary depth (//), then for an addr sub element (/), finally bind the node found to y

  49. Types • T,U ::= () empty sequence B atomic type (char, int,…) T + U union T; U sequence l[T] element type X type name • Type environments: type definitions + type binding for query free variables E ::= () X=T, E x:X, E

  50. A type environment • E= Adrressbook= addrbook [ Person*], Person= person[Name, Addr, (Tel +EMail)], Name = name [String], Addr = addr[String], Tel= tel[String], EMail= email[String], x: Adrressbook

More Related