1 / 82

Introduction to XML Path Language (XPath20)

Introduction to XML Path Language (XPath20). Cheng-Chia Chen. What is XPath ?. Latest version: 2.0 : http://www.w3.org/TR/xpath20 XQuery/XPath Data Model (XDM) XQuery/XPath Formal Semantics XQuery 1.0 and XPath 2.0 Functions and Operators 1. 0 : http://www.w3.org/TR/xpath

pello
Download Presentation

Introduction to XML Path Language (XPath20)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to XML Path Language (XPath20) Cheng-Chia Chen

  2. What is XPath ? • Latest version: • 2.0 : • http://www.w3.org/TR/xpath20 • XQuery/XPath Data Model (XDM) • XQuery/XPath Formal Semantics • XQuery 1.0 and XPath 2.0 Functions and Operators • 1.0 : http://www.w3.org/TR/xpath • a language for addressing parts of an XML document, • designed to be used by XSLT , XQuery, XML Schema and XPointer. • References: xfront, W3Schools

  3. TOC • Introduction • Data Model • Location Paths • Expressions • Core Function Library

  4. 1. Introduction • What is XPath? • A language used to to address parts of an XML [XML] document, • provides basic facilities for manipulation of strings, numbers and booleans, • operate on the abstract, logical structure of an XML document, rather than its surface syntax.

  5. XPath(2.0) data model • provides • a tree representation of XML documents as well as • atomic values such as number, strings, and booleans, and • flat sequencesthat may contain both references to nodes in an XML document and atomic values. • The result of evaluating an XPath expression is a sequence of items, each of which is either • a node from the input document, or • an atomic value.

  6. Type systems of XPath • XPath Expression: • the primary syntactic construct in XPath. • would be evaluated to yield a value, which is a possibly empty sequence of items. • An item is either • a node or • an atomic value.

  7. Expression evaluation (xpath 1.0) • occurs with respect to acontext. • XSLT, XQuery and XPointer specify how the context is determined. • A context consists of: • 1. a node (the context node) • 2. a pair of non-zero positive integers ( the context position and the context size) • 3. a set of variable bindings • 4. a function library • 5. the set of namespace declarations in scope for the expression • Notes: • 3,4,5 does not change when evaluating subexpressions. • 2 can only be changed by predicates • Some expression may change 1.

  8. Expression evaluation (xpath 2.0) • Expression Context • consisting of all information that can affect the result of evaluating an expression • Context are organized into two categories : • static context : contains information available prior to execution • dynamic context : • contains information used during execution • = static context + additional information

  9. Static context A static context consists of: 1. XPath 1.0 compatibility mode : boolean 2. Statically known namespaces (i.e.,(prefix, uri) pairs ) 3. Default element/type namespace (or none) • <e1 .../>, <pre:e2 xsi:type="aType" /> 4. Default function namespace (or none) • max(...), fn:f1(...), ... 5. In-scope schema definitions: • schema type definitions(local+global) + • element declarations (global + local + substitution groups) + • attribute declarations (global+local) • Identified by expanded QName (global) , or implementation dependent identifiers(local or anonymous). 6. In-scope variables. : a set of (EQName, type) pairs. • is the set of variables available for reference within an expression. • some constructs (for,some,every ) may extend in-scope variables of its subexpressions.

  10. Context item static type : the static type of the context item • Function signatures(i.e., callable functions and constructors ) • is the set of functions that are callable from within an expression. • Each function identified by its expanded QName and its arity. • Function signature also specifies the static types of the function parameters and result. • Statically known collations. • is a set of (uri, collation) pairs. A collation is a specification of the manner in which character strings are compared and ordered. Collations are identified by a uri string. • Default collation : is one of statically known collations. • Base URI : is the uri for resolution (relative  absolute). • Statically known documents : • pairs of (s : absolute doc uri, t: type) , where t is the type of fn:doc( s) and the default value of t is document-node()? . • Statically known collections : pairs of (s: uri, t:type), where t is the type of fn:collection(s). • Statically known default collection type : default type ( is node()* if not given) of fn:collection().

  11. Dynamic context = static context + additional items listed below : • Focus = context {item, position, size} • ., position(), last() • Variable values : pairs of (EQName, value), where value also contains dynamic type info. • Function implementations • contains implementation of function signatures given in static context. • Current dateTime : • current-dateTime(), current-date(), current-time() • Implicit timezone: implicit-timezone() • Available documents: Map<uri, document-node> • Available collections : Map<uri, node()*> • Default collection: value of collection()

  12. Location path • The most important kind of expressions • used to selects a set of nodes relative to a context node.

  13. 2. Data Model • details in XQuery/XPath data Model • XPath operates on an XML document as a tree of nodes. • All xpath expressions are evaluated to produce a value. • In Xpath 2.0, a value is always a sequence. • A sequence is an ordered collection of zero or more items. • An item is either • an atomic value or • a node. • An atomic value is a value (in the value space) of an atomic type, as defined in [XML Schema]. • 123 xs:integer; 123.0 xs:decimal; 1.23e2 xs:double • xs:date("2011-12-10") xs:QName('xs:date')

  14. Xpath 2.0 data model • A node is an instance of one of the seven node kinds defined in XQuery/XPath data Model . • Each node has a unique node identity, a typed value, and a string value. • Some nodes have a name, which is a value of type xs:QName. • The typed value of a node is a sequence of zero or more atomic values. • The string value of a node is a value of type xs:string. • In certain situations a value is said to be undefined (for example, the value of the context item, or the typed value of an element node). • This term indicates that the property in question has no value and that • any attempt to use its value results in an error.

  15. Kinds of Atoms • Kinds of atoms • number1.0 (a double floating-point number) • boolean1.0 (true or false) • string1.0 (a sequence of unicode characters) or • generalized to including all atomic datatypes defined by xml schema2.0 • number2.0 is classified further into • integer, decimal, float and double.

  16. Atomization • A sequence of items can be atomized to produce a sequence of atoms by replacing every node item with its typed valueas follows: • root, text node string value +xs:untypedAtomic • comment node, processing-instruction node, namespace node string value +xs:string • attribute  value in the typeAnnotation, or string for type:xs:untypedAtomic • ex: "12.3e2" in xs:dobule => 12.3 e2; • "s1 s2 s3" in xs:IDREFS => sequence ('s1' ,'s2', 's3') of type xs:IDREF* • element of simple content • anySimpleType  string value + xs:untypedAtomic • o/w  value(s) + type // ex: list type • element nodes • xs:untyped or complex type with mixed content  string value + xs:untypedAtomic • complex type + empty content (or nilled ='true' ) () • complex type + complex element only content  undefined • The typed value of a sequence s can be queried by invoking fn:data(s).

  17. Types of nodes in an XML tree • All but namespace node are the same as in XPath 1.0 • The tree contains nodes. • Types of nodes and their possible children: • root nodes : element ( = 1), comment, PI • element nodes: element, text, PI, comment, [attribute, namespace] • text nodes: leaves • attribute nodes : leaves • namespace nodes:leaves// xpath2.0 need not support • // xquery1.0 do not support • processing instruction nodes : leaves • comment nodes : leaves

  18. Basic concepts • See Concepts from XDM • Node Identities • Document Order • Sequence • Types

  19. Node Identity • Every node has a unique identity. (like objects in Java) • identical to itself, • not identical to any other node. • I.e., node1 is node2 iff node1 and node 2 correspond to the same node occurrence. • Notes: • node identity ≠ ID attribute. • An element has an identity even if it has no ID attributes. • Non-element Nodes also have unique identity. • Atomic values do not have identity; • every occurrence of “5” as an integer is identical to every other occurrence of “5” as an integer.

  20. Example <courses> <course name =“dismath”> <student idref=“Wang” /> <student idref=“chen” /> … </course> <course name=“compiler”> <student idref=“Wang” /> <student idref=“Chang”/> … </course> </courses> Ex: • xpath: ( /courses/course[name=‘dismath’]/student[1] is (//student)[3] ) returns false. • xapth: ((//students)[1]/@idref is (//students)[3]/@idref ) returns false. (why?)

  21. Document order and reverse document order • Same as in XPath 1.0

  22. Example <?xml version=“1.0” ?> <a xmlns:ns1 = “uri1” at1 = “…” at2=“…” > <a1> data1 </a1> <a2> data2 </a2> <a3><b3/><!-- comment 1 --> </a3> <?pi pidata ?> </a> • Doc order: root < a < ns1 < { at1,at2} < a1 < ns14a1 < data1 … < a3 < ns14a3 < b3 < ns14b3 < comment < pi

  23. Sequences • Sequence of items is the unique output type of all XPath expressions. • A sequence may contain nodes, atomic values, or any mixture of nodes and atomic values. • no distinction between an item and a singleton sequence containing that item. • (‘123’ ) = ‘123’ ; node2 = ( node2 ). • A node does not loose its identity when it is added to a sequence. [i.e., only references to the node are added] • A node may occur in multiple places of one or more sequences. • Sequences are flat and never contain other sequences. • Appending (d e) to (a b c) will not produce (a b c (d e)) but would flat it to (a b c d e ) automatically. • Notes: • Sequences replace node-sets from XPath 1.0. • In XPath 1.0, node-sets do not contain duplicates.

  24. Types in XDM • accept all types defined by XML Schema • supports XSLT and XQuery whose type system are based on XML Schema. • includes 19 built-in primitive types, 5 additional types defined by XDM and user/implementor defined types. • type system defined in XQuery&XPath formal semantics • Every item in the data model has both a value and a type. Examples: • nodes  node type, • 5  xsd:integer ; • ‘5’  xsd:string; • “Hello World.”  xsd:string.

  25. 5:xsd:int

  26. XDM Type Hierarchy • from XDM Type Hierarchy.

  27. Representation of Types • Use expanded-QName (EQName) to represent a type. • Definition: An expanded-QName is a set of three values consisting of • {prefix} a possibly empty prefix, • {namespace name} a possibly empty namespace URI and • {local name} a local name. • Note: Only URI and local name is used for identity. • Lexical representation of an expanded QName: • [pre1:] localName • URI determined by context. • A type [with target namespace = n1 and local name = loc1] is represented by a EQName[ whose URI = n1 and local Name = loc1].

  28. General constraints on nodes All nodes must satisfy the following general constraints: • 1. Every node must have a unique identity, distinct from all other nodes. [unique identity] • 2. The children property of a node must not contain two consecutive Text Nodes. [no adjacent texts ] • 3. The children property of a node must not contain any empty Text Nodes. [no empty text ] • 4. The children and attributes properties of a node must not contain two nodes with the same identity. [no sharing of nodes ] • I.e., no sharing of contained nodes (hence a tree but not a dag ).

  29. Predefined Types (link) • xs:untyped • denotes the dynamic type of an element nodethat has not been validated, or has been validated in skip mode. • xs:untypedAtomic • denotes untyped atomic data, such as text that has not been assigned a more specific type or attribute value that is validated in skip mode • xs:anyAtomicType • derived from xs:anySimpleType • the root of all atomic types (not including list or union type) • the base type of all 23 primitive types. • xs:dayTimeDuration, xs:yearMonthDuration • derived from xs:duration • form: PddDTddHddMdd:ddd • form: PddddYmmM

  30. atomic (Typed) value constructions • signature (format): see XPath constructor functions • prefix:TYPE($arg asxs:anyAtomicType?)asprefix:TYPE? • Notes: • ? means the input and output is a sequence of zero or one atomic value. • if $arg is empty () then the output is defined to be also the empty sequence (). • possible prefix:TYPE • xs:integer, xs:int, xs:datetime, xs:boolean,… • can also be user defined atomic types : bk:ISBN, np:IP QName of target type InputType OutputType

  31. List of constructors for built-in types • xs:string($arg as xs:anyAtomicType?) as xs:string? • xs:string(“abc”)  string “abc”; xs:string(123)  “123” • xs:boolean($arg as xs:anyAtomicType?) as xs:boolean? • xs:boolean(“abc”)  error; xs:boolan(“”)  error; xs:boolean(10)  true; • xs:boolean()  error; xs:boolean(())  () • Note: xs:boolean != fn:boolean (effective boolean value) • xs:decimal($arg as xs:anyAtomicType?) as xs:decimal? • xs:decimal(“123.456789” )  123.456789 • xs:float($arg as xs:anyAtomicType?) as xs:float? • xs:double($arg as xs:anyAtomicType?) as xs:double? • Note: • xs:int(“1234567891234”) error • xs:integer(“1234567891234)  1234567891234

  32. All others are similar. • xs:duration, xs:dateTime, xs:time,xs:date,xs:gYearMonth, • xs:gYear,xs:gMonthDay,xs:gDay,xs:gMonth • xs:hexBinary,xs:base64Binary • xs:anyURI,xs:QName • xs:normalizedString, xs:token, xs:language, • xs:NMTOKEN, xs:Name, xs:NCName, • xs:ID, xs:IDREF, xs:ENTITY, • xs:integer, xs:long, xs:int, xs:short, xs:byte • xs:nonPositiveInteger,xs:negativeInteger • xs:nonNegativeInteger, • xs:unsignedLong,xs:unsignedInt,xs:unsignedShort, xs:unsignedByte, • xs:positiveInteger,xs:yearMonthDuration, • xs:dayTimeDuration, xs:untypedAtomic,

  33. More Examples • xs:string(“abc”), xs:int(“123”) • xs:float(“123.3e10”) • xs:date(“2006-11-12”) • xs:gMonthYear(“--11-12:) • xs:gMonth(“--11”) • xs:gDay(“---12”) • xs:dateTime(“2006-11-12T12:00:00"). • fn:dateTime( xs:date("1999-12-31"),xs:time("12:00:00")) xs:dateTime("1999-12-31T12:00:00"). • fn:dateTime( xs:date("1999-12-31"), xs:time("24:00:00")) returns xs:dateTime("1999-12-31T00:00:00") because "24:00:00" is an alternate lexical form for "00:00:00". • note: 24:00:00 = 00:00:00

  34. String values • Every atomic value has a string representation. • The value can be obtained by the casting operation: • Ex: • ( xs:int(“123”) + 45 ) cast as xs:string • return “168”

  35. Properties of nodes • string value • Every node has a string-value, which is part of the node or computed from the string-value of descendant nodes. • can be obtained by string(.) • typed value • can be obtained by data(.) • expanded-name1.0 ( in 2.0 it is replaced with EQName) • expanded-name = namespce URI + local part • The namespace URI is either null or a URI string [RFC2396]. • Two expanded-names are equal if they have the same local part, and the same namespace URIs

  36. Node relationship • Same as in xpath 1.0

  37. properties/relationship of nodes m(e) is the URI bound to prefix e

  38. 3 Location Paths (renamed PathExpr in 2.0) • Same as in xpath 1.0 (except some mirror change) • LocationPath • a special kind of expressions, • used to locate a sequence of nodes in the document. • sorted in document order • no duplicates

  39. Kinds of Expressions 3.1 Primary Expressions : string + numeric literls 3.2 Path Expressions 3.3 Sequence Expressions: , to [ … ], |, intersect, - 3.4 Arithmetic Expressions : +, - , *, div, idiv, mod 3.5 Comparison Expressions: is, <, >, =, le, ge, eq, ne… 3.6 Logical Expressions : and, or, not, 3.7 For Expressions : for 3.8 Conditional Expressions : if 3.9 Quantified Expressions : every, some 3.10 Expressions on SequenceTypes

  40. Primary Expressions • Literals • string: “abc”, ‘abc’, “He said “”OK”” ”, ‘He said “ok” ’. • numerical: 123  xs:integer, 123.4  xs:decimal • 124.4e5  xs:double • non-literals: • xs:int(“125”) = xs:int(125) = 125 cast as xs:int • boolean : fn:true(), fn:false() • Variable References : $pre:name, $var-1 • Parenthesized Expressions : ( ), ( expr ) • Context Item Expression : . • (1 to 100) [. mod 5 eq 0] //book[ fn:count(./author) > 1 ] • Function Calls : pre:fName( arg1, …, argn ) • fn:concate(“abc”, “def”)

  41. Literal Expressions 42 3.1415 6.022E23 ’XPath is a lot of fun’ ”XPath is a lot of fun” ’The cat said ”Meow!”’ ”The cat said ””Meow!””” ”XPath is just so much fun”

  42. Variable References $foo $bar:foo • $foo-17 refers to the variable ”foo-17” • Possible fixes: ($foo)-17, $foo -17, $foo+-17

  43. XPath operators and their precedences

  44. Path Expressions • Locations paths are expressions • They may be applied to arbitrary sequences • evaluation rule discussed before.

  45. Sequence Expressions • Constructing Sequences : , , to • (1,2,3) ,(), (3)  (1,2,3,3) • 2 to 4  (2,3,4) (10, (1 to 3))  (10,1,2,3) • (1,(2,3,4),((5)))  (1,2,3,4,5) -- flatten • Filter Expressions : PrimaryExpr [ … ]* • (1 to 30) [ . mod 3 = 0 ] [ . mod 5 = 0 ]  (15, 30) • (10 to 20) [ 5]  (14) • Combining Node Sequences (for Node only): • assume doc order : A < B < C < D < E • union: (A,B,A) | (B,C) | (A,C) = (A,B) union (B,C) (A,B,C) • intersect, except : • (A,B,C,D )intersect (B,D,A,E) except (B) •  (A, D).

  46. Filter Expressions • Predicates generalized to arbitrary sequences • The expression ’.’ is the context item • The expression: (10 to 40)[. mod 5 = 0 and position)>20] has the result: 30, 35, 40

  47. Arithmetic Expressions • +, -, *, div, idiv, mod, +, - (unary) • -3 div 2  -1.5 (decimal) • -3 idiv 2  -1 (integer) • -3.4 mod 2 (or -2)  -1.4 • rule: x = y * ( x idiv y) + (x mod y) • precedence : {+,-} < {*, mod, div,idiv} < {unary +,-} • Operators are generalized to sequences • if any argument is empty, the result is empty • () + 3  () • All argument are singleton sequences of numbers: • ( 3) + ( 4) + 5  12 • otherwise, a runtime error occurs • (1,3) + (2,4)  error

  48. Comparison Expressions  boolean • Value Comparisons • comparison operators : eq, ne, lt, le, gt, ge. • used for comparing single values. • General Comparisons (**) • operators: =, !=, <, <=, >, >=. • are existentially quantified comparisons that may be applied to operand sequences of any length. • The result is true or false if it does not raise an error. • Node Comparisons • operators: is, >>, << • A is B  true if A anb B are the same node • A << B = B >> A  true if if A preceds B in doc order.

  49. Value Comparison • Comparison operators: • eq(=), ne(≠), lt(<), le(<=), gt(>), ge(>=) • Used on atomic values • When applied to arbitrary values ( sequence ): • atomize • if either argument is empty => () • if one has length > 1 => type error • if incomparable, a runtime error ; ex:8 < “abc” • otherwise, compare the two atomic values • 8 eq 4+4 (//rcp:ingredient)[1]/@name eq”beef cube steak”

  50. Node Comparison • Operators: is, <<, >> • Used to compare nodes on identity and order • is is for node identity; >>, << for node ordering • When applied to arbitrary values: • if either argument is empty, the result is empty • if both are singleton nodes, the nodes are compared • otherwise, a runtime error. Ex: //book[1] is “abc” Ex: • (//student)[2] is //student[@id = ”s9527”] • /rcp:collection << (//rcp:recipe)[4] • (//rcp:recipe)[4] >> (//rcp:recipe)[3]

More Related