1 / 70

Web Data Management

Web Data Management. XPath. In this lecture. Review of the XPath specification data model examples syntax Resources: A formal semantics of patterns in XSLT by Phil Wadler. XML Path Language (XPath) www.w3.org/TR/xpath. XPath. http://www.w3.org/TR/xpath (11/99)

aizza
Download Presentation

Web Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Data Management XPath

  2. In this lecture • Review of the XPath specification • data model • examples • syntax Resources: A formal semantics of patterns in XSLT by Phil Wadler. XML Path Language (XPath) www.w3.org/TR/xpath

  3. XPath • http://www.w3.org/TR/xpath (11/99) • Building block for other W3C standards: • XSL Transformations (XSLT) • XML Link (XLink) • XML Pointer (XPointer) • XML Query • Was originally part of XSL

  4. XPath • An expression language to be used in another host language (e.g., XSLT, XQuery). • Allows the description of paths in an XML tree, and the retrieval of nodes that match these paths. • Can also be used for performing some (limited) operations on XML data.

  5. Example for XPath Queries <bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><bookprice=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book> </bib>

  6. Data Model for XPath • XPath expressions operate over XML trees, which consist of the following node types: • Document: the root node of the XML document; • Element: element nodes; • Attribute: attribute nodes, represented as children of an Element node; • Text: text nodes, i.e., leaves of the XML tree.

  7. bib Data Model for XPath The root Processing instruction Comment The root element book book Attr= “1” element attribute publisher author . . . . Much like the Xquery data model Addison-Wesley Serge Abiteboul text

  8. Data Model for XPath • The root node of an XML tree is the (unique) Document node; • The root element is the (unique) Element child of the root node; • A node has a name, or a value, or both • an Element node has a name, but no value; • a Text node has a value (a character string), but no name; • an Attribute node has both a name and a value. • Attributes are special! Attributes are not considered as first-class nodes in an XML tree. They must be addressed specifically, when needed.

  9. XPath: Simple Expressions /bib/book/year Result: <year> 1995 </year> <year> 1998 </year> /bib/paper/year Result: empty (there were no papers)

  10. XPath Tree Nodes • Seven nodes types: • root, element, attribute, text, comment, processinginstruction and namespace • Namespace and attribute nodes have parent nodes, but are not children of those parent nodes. • The relationship between a parent node and a child node is containment • Attribute nodes and namespace nodes describe their parent nodes

  11. Xpath Tree Nodes <?xml version = "1.0"?> <!-- Fig. 11.3 : simple2.xml --> <!-- Processing instructions and namespacess --> <html xmlns = "http://www.w3.org/TR/REC-html40"> <head><title>Processing Instruction and Namespace Nodes </title> </head> <?deitelprocessor example = "fig11_03.xml"?> <body> <deitel:book deitel:edition = "1"xmlns:deitel = "http://www.deitel.com/xmlhtp1"><deitel:title>XML How to Program</deitel:title></deitel:book> </body> </html>

  12. XPath Tree Nodes • String-value: Each XPath tree node has a string representation that XPath uses to compare nodes. • The string-value of a text node consists of the character data contained in the node. • Document order: Nodes in an XPath tree have an ordering that is determined by the order in which the nodes appear in the original XML document. • The reverse document order is the reverse ordering of the nodes in a document. • The string-value for the html element node is determined by concatenating the string-values for all of its descendant text nodes in document order. • The string-value for element node html is ProcessingInstructionandNamespaceNodesXMLHowtoProgram • Because all whitespace is removed when the text nodes are normalized, there is no space in the concatenation.

  13. XPath Tree Nodes • For processing instructions, the string-value consists of the remainder of the processing instruction after the target, including whitespace, but excluding the ending ?> • The string-value for the processing instruction is • example = "fig11_03.xml" • Namespace-node string-values consist of the URI for the namespace. • The string-value for the namespace declaration is http://www.deitel.com/xmlhtpl

  14. XPath Tree Nodes • For the root node of the document, the string-value is also determined by concatenating the string-values of its text-node descendents in document order. • The string-value of the root node is therefore identical to the string-value calculated for the html element node • The string-value for the edition attribute node consists of its value, which is 3. • The string-value for a comment node consists only of the comment's text, excluding <!-- and -->. • The string-value for the second comment node is therefore: Processing instructions and namespacess.

  15. XPath Tree Nodes • Expanded-name: Certain nodes (i.e., element, attribute, processing instruction and namespace) also have an expanded-name that can be used to locate specific nodes in the XPath tree. • Expanded-names consist of both a local part and a namespace URI. • The local part for the element node html is therefore html. • If there is a prefix for the element node, the namespace URI of the expanded-name is the URI to which the prefix is bound. • If there is no prefix for the element node, the namespace URI of the expanded name is the URI for the default namespace.

  16. XPath Tree Nodes • The local part of the expanded name for a processing instruction node corresponds to the target of the processing instruction in the XML document. • For processing instructions, the namespace URI of the expanded-name is null • The local part of the expanded-name for a namespace node corresponds to the prefix for the namespace, if one exists; or, if it is a default namespace, the local part is empty (i.e., the empty string). • The namespace URI of the expanded-name for a namespace node is always null.

  17. XPath Tree Nodes

  18. XPath: Axes • A location path is an expression that specifies how to navigate an XPath tree from one node to another. • A location path is composed of location steps, each of which is composed of an "axis," a "node test" and an optional "predicate." • Searching through an XML document begins at a contextnode in the XPath tree. • Searches through the XPath tree are made relative to this context node. • An axis indicates which nodes, relative to the context node, should be included in the search. • The axis also dictates the ordering of the nodes in the set. • Axes that select nodes that follow the context node in document order are called forward axes. • Axes that select nodes that precede the context node in document order are called reverse axes.

  19. XPath Context • A step is evaluated in a specific context [< N1,N2, · · · ,Nn>, Nc] which consists of: • a context list < N1,N2, · · · ,Nn> of nodes from the XML tree; • a context node Ncbelonging to the context list. • The context lengthn is a positive integer indicating the size of a contextual list of nodes; it can be known by using the function last(); • The context node positionc  [1,n] is a positive integer indicating the position of the context node in the context list of nodes; it can be known by using the function position().

  20. XPath Steps • The basic component of XPath expression are steps, of the form: axis::node-test[P1][P2]. . . [Pn] • axis is an axis name indicating what the direction of the step in the XML tree is (child is the default). • node-test is a node test, indicating the kind of nodes to select. • Piis a predicate, that is, any XPath expression, evaluated as a boolean, indicating an additional condition. There may be no predicates at all. • A step is evaluated with respect to a context, and returns a node list.

  21. Path Expressions • A path expression is of the form: [/]step1/step2/. . . /stepn • A path that begins with / is an absolute path expression; • A path that does not begin with / is a relative path expression. • Examples • /A/B is an absolute path expression denoting the Element nodes with name B, children of the root named A; • ./B/descendant::text() is a relative path expression which denotes all the Text nodes descendant of an Element B, itself child of the context node; • /A/B/@att1[.> 2] denotes all the Attribute nodes @att1 whose value is greater than 2.

  22. Evaluation of Path Expressions • Each stepiis interpreted with respect to a context; its result is a node list. • A step stepiis evaluated with respect to the context of stepi−1. More precisely: • For i = 1 (first step) if the path is absolute, the context is a singleton, the root of the XML tree; else (relative paths) the context is defined by the environment; • For i > 1 if N = < N1,N2, · · · ,Nn > is the result of step stepi−1, stepiis successively evaluated with respect to the context [N,Nj ], for each j  [1,n]. • The result of the path expression is the node set obtained after evaluating the last step.

  23. Evaluation of Path Expressions • Evaluation of /A/B/@att1 • The path expression is absolute: the context consists of the root node of the tree. • The first step, A, is evaluated with respect to this context.

  24. Evaluation of /A/B/@att1 • The result is A, the root element. • A is the context for the evaluation of the second step, B.

  25. Evaluation of /A/B/@att1 • The result is a node list with two nodes B[1], B[2]. • @att1 is first evaluated with the context node B[1].

  26. Evaluation of /A/B/@att1 • The result is the attribute node of B[1].

  27. Evaluation of /A/B/@att1 • @att1 is also evaluated with the context node B[2].

  28. Evaluation of /A/B/@att1 • The result is the attribute node of B[2].

  29. Evaluation of /A/B/@att1 • Final result: the node set union of all the results of the last step, @att1.

  30. XPath: Axes

  31. XPath: Axes • An axis has a principal node type that corresponds to the type of node the axis may select. • For attribute axes, the principal node type is attribute. • For namespace axes, the principal node type is namespace. • All other axes have a element principal node type.

  32. XPath: Axes • Child axis: denotes the Element or Text children of the context node. • Important: An Attribute node has a parent (the element on which it is located), but an attribute node is not one of the children of its parent. • Example: child::D

  33. XPath: Axes • Parent axis: denotes the parent of the context node. • The node test is either an element name, or * which matches all names, node() which matches all node types. • Always a Element or Document node, or an empty node-set (if the parent does not match the node test or does not satisfy a predicate). • .. is an abbreviation for parent::node(): the parent of the context • Example: parent::node()

  34. XPath: Axes • Attribute axis: denotes the attributes of the context node. • The node test is either the attribute name, or * which matches all the names. • Example: attribute::*

  35. XPath: Axes • Descendant axis: all the descendant nodes, except the Attribute nodes. • The node test is either the node name (for Element nodes), or * (any Element node) or text() (any Text node) or node() (all nodes). • The context node does not belong to the result: use descendant-or-self instead. • Example: descendant::node()

  36. XPath: Axes • Example: descendant::*

  37. XPath: Axes • Ancestor axis: all the ancestor nodes. • The node test is either the node name (for Element nodes), or node() (any Element node, and the Document root node). • The context node does not belong to the result: use ancestor-or-self instead. • Example: ancestor::node()

  38. XPath: Axes • Following axis: all the nodes that follows the context node in the document order. • Attribute nodes are not selected. • The node test is either the node name, * text() or node(). • The axis preceding denotes all the nodes that precede the context node. • Example: following::node()

  39. XPath: Axes • Following sibling axis: all the nodes that follows the context node, and share the same parent node. • Same node tests as descendant or following. • The axis preceding-sibling denotes all the nodes the precede the context node. • Example: following-sibling::node()

  40. Location Path Abbreviations

  41. XPath: Node Tests • The set of selected nodes is refined with node tests. • node tests rely upon the principal node type of an axis for selecting nodes in a location path

  42. XPath: Axes • Location Paths Using Axes and Node Tests • Location paths are composed of sequences of location steps. • A location step contains an axis and a node test separated by a double-colon (::) and, optionally, a "predicate" enclosed in square brackets ([]). child::* • The above location path selects all element-node children of the context node, because the principal node type for the child axis is element.

  43. XPath: Wildcard //author/child::* or //author/* Result: <first-name> Rick </first-name> <last-name> Hull </last-name> * Matches any element

  44. XPath: Axes and Node Tests • child::text() • selects all text-node children of the context node • Combining two location steps to form the location path • child::*/child::text() • selects all text-node grandchildren of the context node

  45. XPath: Node Tests /bib/book/author/text() Result: Serge Abiteboul Victor Vianu Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname /bib/book/author/*/text() Result: Rick Hull

  46. XPath: Restricted Kleene Closure • select all author element nodes in an entire document /descendent-or-self::node()/child::author • Instead use the abbreviation: //author Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> /bib//first-name Result: <first-name> Rick </first-name>

  47. XPath: Attribute Nodes /bib/book/@price Result: “55” @price means that price has to be an attribute

  48. XPath: Predicates • Boolean expression, built with tests and the Boolean connectors and/or (negation is expressed with the not() function); • a test is • either an XPath expression, whose result is converted to a Boolean; • a comparison or a call to a Boolean function. • Important: predicate evaluation requires several rules for converting nodes and node sets to the appropriate type.

  49. Predicate Evaluation • A step is of the form axis::node-test[P] • First axis::node-test is evaluated: one obtains an intermediate result I • Second, for each node in I, P is evaluated: the step result consists of those nodes in I for which P is true. /A/B/descendant::text()[1]

  50. Predicate Evaluation • Beware: an XPath step is always evaluated with respect to the context of the previous step. • Here the result consists of those Text nodes, first descendant (in the document order) of a node B. • /A/B//text()[1]

More Related