250 likes | 443 Views
XML- : an extendible framework for manipulating XML data. Jaroslav Pokorny Charles University Praha. Two approaches to XML. logical or physical Idea: XML as a database DB of XML documents „mix“ of (relational) DB and XML data XML views (over non-XML and/or XML data) Advantages:
E N D
XML-: an extendible framework for manipulating XML data Jaroslav Pokorny Charles University Praha XML-KSI, 2004
Two approaches to XML logical or physical Idea: XML as a database • DB of XML documents • „mix“ of (relational) DB and XML data • XML views (over non-XML and/or XML data) Advantages: • independence on original platforms and models on processed data • more flexible for design, manipulation (integration, updates, querying) XML-KSI, 2004
Two approaches to XML • implications • implementations: XML DBs (native, via relational, OO, OR), • special demands on query languages • how do them powerful • how to describe their semantics • how implement them • new types of software: wrappers, mediators • (personal) goal: to develop a powerful formal approach appropriate for manipulating both XML and non-XML data XML-KSI, 2004
Outline • XML - shortly • XML – functional data model • functional typing XML (and non-XML data) • LT language • XML-schema, XML-database • XML- framework • Conclusions XML-KSI, 2004
XML – an example <!DOCTYPE biblio [ <!ELEMENT biblio (book monograph)*> <!ELEMENT book (title, author*)> <!ELEMENT title (#PCDATA) <!ELEMENT monograph (title, author, editor)> <!ATTLIST monograph year CDATA #REQUIRED> <!ELEMENT editor (monograph*)> <!ELEMENT author (name, address?)> <!ELEMENT name (firstname?, surname)> <!ELEMENT firstname (#PCDATA) > <!ELEMENT surname (#PCDATA) > <!ELEMENT address(locality, ZIP)> <!ELEMENT locality (#PCDATA) > <!ELEMENT ZIP (#PCDATA) > ]> XML-KSI, 2004
XML – an example <book> <title> Fundamentals of DBS </title> <author > <name> <firstname> Ramez </firstname> <surname> Elmasri </surname> </name> <address > <locality> Arlington </locality> <ZIP> 76019 </ZIP> </address> </author > <author > <name> <firstname> Shamkant </firstname> <surname> Navathe </surname> </name> </author > </book> XML-KSI, 2004
MEMBER* DEPARTMENT PROJECT* XML model • Usually: tree- or graph-oriented • Here: inspiration by functional approach to conceptual modelling For example, the HIT data model from 80s. XML-KSI, 2004
Synopsis of the approach • Typing XML data Background: • a functional type system (base of primitive types + functions, tuples, and unions) Extensions to: • typing XML regular expressions, • typing XML elements. • Querying XML elements • a general typed -calculus (functional variables and constants, tuples, applications of functions, -abstractions) • XML-database schema as a set of variables of types, • XML-database as any valuation of these variables • XML- - a syntactic variant of the typed -calculus over XML-data XML-KSI, 2004
phone element object will be conceived as a (partial) function from E into PCDATA. Typing XML data - informally E … a set of abstract elements. The content of an abstract element will be either a string from PCDATA, in the easiest example, or a sequence of abstract subelements (or groups), or empty. Ex: <phone>781 7090</phone>. It is an instance of a phone element object. For an eE, phone(e) returns e.g. the phone number ‘781 7090‘. XML-KSI, 2004
Typing XML data - informally Ex: <!ELEMENT name (firstname?, surname)> is conceived a set of functions from E EE The current name element object, i.e. the one stored in a given XML database, is a function assigning to each abstract element eE at most a couple of abstract elements. Hierarchy of notions: element type, element object, element XML-KSI, 2004
Functional typing B … a set of symbols (the base) T ::= Sprimitive type (T1T2) functional type (T1,...,Tn) tuple type (T1 + T2) uniontype where S B Remark: relations are ((T1,...,Tn ) BOOL)-objects! XML-KSI, 2004
Functional typing Interpretation: Members of B … mutually disjoint non-empty sets, (T1 T2) ... the set of all (total or partial) functions from T1 into T2, (T1,...,Tn) … T1... Tn, (T1+…+Tn ) … Ti Exs: • arithmetic operations: +, -, *, / are ((NUMBER, NUMBER) NUMBER)-objects. • logic: • and/((BOOL, BOOL) BOOL), • universal R-quantifier R,and existential R-quantifiers R are ( (R BOOL) BOOL) - objects. • R-identity =Ris ((R,R) BOOL)-object. • aggregation functions: COUNTR/((R BOOL) NUMBER) XML-KSI, 2004
Typing XML regular expressions Let B = {PCDATA, BOOL, NAME}. The type systemTregover B is recursively defined as follows. T ::= tag: PCDATA tag: where tag NAME.elementaryregular expression T* zero or more T+ one or more T? zero or one where T is an alternative or elementary regular expression. (T1T2) alternative XML-KSI, 2004
Typing XML regular expressions Interpretation: Ex.: (T1T2) … a set of objects of type T1T2. T* … (T BOOL) /partially ordered model/ T* … ((T, NUMBER) BOOL)/ordered model/ • Consider a function f of this type. For a couple (t, i), f(t, i) = TRUE iff t is ith object in an (ordered) set of T-objects. XML-KSI, 2004
Typing XML elements and attributes Treg over B, E. The type systemTEinduced byTreg (or TEif Treg is understood) containing the regular element expressions given by the following rules: E ::= TAG:TTAG: elementary element types where tag:T and tag: are elementary regular expressions over B E* E+ E? (E1E2) TAG:(E1,..., En) where tag NAME. Elementary element types and regular element expressions TAG:(E1,...,En) are called element types. XML-KSI, 2004
Typing XML elements and attributes Semantics of element types: TAG:PCDATA … the set of all (partial functions) from E to tag:PCDATA … etc Attributes are also functions. Ex.: year (of monograph) is a function assigning to each monograph its year (of issue). Notation: EMONOGRAPH CDATA XML-KSI, 2004
TITLE:PCDATA FIRSTNAME:PCDATA SURNAME:PCDATA LOCALITY:PCDATA ZIP:PCDATA ADDRESS:(LOCALITY, ZIP) BOOK:(TITLE, AUTHOR*) NAME:(FIRSTNAME, SURNAME) MONOGRAPH:(TITLE, AUTHOR, EDITOR) YEAR/(MONOGRAPH CDATA) EDITOR:MONOGRAPH* AUTHOR:(NAME, ADDRESS?) BIBLIO: (BOOKMONOGRAPH)* Example: BIBLIO element types XML-KSI, 2004
LT language (Language of Terms) Func ... constants, each of a fixed type, variables for each type from T. Let types T, T1, ..., Tn (n 1) are members of T. Typed constants and variables are terms. M(M1,...,Mn) application x1,...,xn(M) -abstraction where x1,...,xn are distinct variables (M1,...,Mn) tuple Mi projections for a term M (M1,...,Mn) K:M tagged term where K/NAME. If M/T, then K:M/(E T). XML-KSI, 2004
Schema and DB • XML-database schema, SXML, is a set of variables of types from TE. • Given a database schema SXML, an XML-database is any valuation of these variables. Ex.: SURNAME, AUTHOR XML-KSI, 2004
XML- framework What is it? XML- framework is a subset of LT + syntactic sugar Features: • queries are expressed by terms • Ex.: AUTHOR (1) RESULT: AUTHOR …. more „XML-like“) Typically: .. ( .. …(expression)…), where expression/BOOL x (AUTHOR(x))does the same as (1) • paths as compositions of functions Ex.: SURNAME(NAME(AUTHOR(m))) where m is a monograph abstract element object Notation: m.AUTHOR.NAME.SURNAME XML-KSI, 2004
XML- framework • applications of logic, arithmetic, … functions e (b.AUTHOR(e) and e.NAME.SURNAME = ‘Smith’) where b is a book abstract element object be (b.AUTHOR(e) and e.NAME.SURNAME = ‘Smith’) is a YES/NO query. XML-KSI, 2004
XML- framework • restructuring name:x.NAME(title:y(.BOOK.(AUTHOR(x) and TITLE = y)) ) title:y (name:x.NAME(.BOOK.(AUTHOR(x) and TITLE = y)) ) Notation: tagged variables, content of abstract elements by y, x • aggregations + nesting D. For each book, find the number of its authors. x, n (.BOOK..(TITLE = x and COUNT(AUTHOR) = n)) Notation: dots .. for omitting parts of paths and prefixes • possibility to embed any user defined function XML-KSI, 2004
XML- framework D(XQuery): FOR $x IN distinct(document(“biblio1.xml”)//book) LET $n := count($x/author) RETURN <book> <name>$x/title/text()</name> <numb_of_auth>$n</numb_of_auth> </book> XML-KSI, 2004
user answer Integration of heterogeneous information sources query typed objects relational schemes, DTDs, ADTs, classes in OO XML-KSI, 2004
Conclusions Issues: • finding appropriate restrictions of XML- for querying • implementation is in progress The forthcoming paper: • cleaning the model (ordered and unordered) • formal semantics of types, • extensions to tagged variables Future: • XML- with tag variables • semantics of XQuery in XML- framework XML-KSI, 2004