Introduction to XPath

Introduction to XPath Bun Yue Professor, CS/CIS UHCL

Resources • XPath 1.0: http://www.w3.org/TR/xpath • XPath 2.0: http://www.w3.org/TR/xpath20/ • EditiX (free edition): http://free.editix.com/ • XPath 1.0 testbed by whitebeam: http://www.whitebeam.org/library/guide/TechNotes/xpathtestbed.rhtm

Introduction to XPath 1.0 • XPath is used to address parts of an XML document. • XPath is a W3C recommendation. • The newest version is 2.0, which is largely backward compatible. • XPath is used by XPointer, XSLT and XQuery. • XPath is designed to access elements, but not creating new elements. • Designed to be embedded in a host language, such as XSLT or XQuery.

Location Path • XPath uses path expressions to address parts of the documents, called location path. • A location path is composed of a sequence of location steps, separated by a '/'.

Location Path • A location path can be absolute or relative. • an absolute location path starts with '/', the document root. • a relative location path does not start with '/'. Its path is relative to a context node.

XPath 1.0 Results • The result of an XPath 1.0 may be one of the following four types: • Number • String • Boolean • node-set: a set of node • As a set, there is no duplicate node. • Not the same as a document fragment. • To be replaced by sequence in XPath 2.0.

Example /stocks/stock matches all element nodes stock that are children of the root element stocks.

Editix • In Editix, use “>View > Windows > XPath View” to execute XPath expressions. • May select XPath 1.0 or 2.0.

Location Step • A location step is composed of three parts: • a node axis (required): to describe direction for navigation. • a node test (required): to specify the node type, and • a set of node predicate (optional): to specify additional inclusion test.

Example //stocks/child::stock[@symbol=“IBM"]/lastprice Consider the location step: child::stock[@symbol=“IBM"] axis: childnode test: stockpredicate: [@symbol=“IBM"]

Axis • An axis is the first part of the location step and is followed by :: before the node test and predicates. • There are 13 axes in XPath 1.0. • The default axis is the child axis. • The symbol @ can be used for the attribute axis.

Axes in XPath 1.0 • child: the children of the context node. (not including attribute nodes). • descendant: contains the descendants of the context node. • parent: contains the parent of the context node, if there is one. • ancestor: the ancestors of the context node; including the root node if the context node is not the root node. • following-sibling: all the following siblings of the context node. • preceding-sibling: all the preceding siblings of the context node.

Axes in Path 1.0 • following: all nodes in the same document as the context node that are after the context node in document order, excluding any descendants and excluding attribute nodes and namespace nodes • preceding: all nodes in the same document as the context node that are before the context node in document order, excluding any ancestors and excluding attribute nodes and namespace nodes • attribute: contains the attributes of the context node; the axis will be empty unless the context node is an element

Axes in XPath 1.0 • namespace: the namespace nodes of the context node; the axis will be empty unless the context node is an element • self: contains just the context node itself • descendant-or-self: the context node and the descendants of the context node • ancestor-or-self: the context node and the ancestors of the context node; thus, the ancestor axis will always include the root node.

Shorthand • . is the shorthand for self::node() • .. is the shorthand for parent::node(). • // is the shorthand for /descendant-or-self::node()/

Node tests in XPath 1.0 • The second part of a location step. It is required. • There are three kind of node tests: • NameTest: the name of the node. • NodeType test: • node(): all nodes, including comments and PI, excluding attributes and the document root. • text() • comment() • processing-instruction('pi-name') • * is a wildcard character matching any name. It is a name test.

Predicate tests • Predicate tests are the last part of a location steps. • They are enclosed by [] and are optional. • There may be more than one predicate test. • XPath built-in functions can be used to construct predicate (boolean) expression as the added condition for inclusion. • Boolean operators: and, or.

Example //text() matches all text nodes. //@p[.='1'] select all attributes with the name p with value 1. //person[first][last]

XPath Functions • There are many XPath 1.0 functions for testing and other purposes. • Many of them are obvious. The non-obvious ones are explained below.

XPath 1.0 Functions • boolean(): convert to boolean data type. • false(): returns false always. • lang(arg): returns True iff the xml:lang attribute of the context node is the same as a sublanguage of the language specified by the argument string arg. • not(arg): negation of arg. • true() • count(arg): number of nodes in the nodeset argument arg.

XPath Functions • id(arg): select elements with their id argument arg. • last(): returns the context size of the expression evaluation context • local-name(arg): returns the local name of the first node in the node-set argument arg; returns the local name of the context node if arg is missing. • name() • namespace-uri() • position(): returns the promixity position (starting from one) of the context node within the axis.

XPath 1.0 Functions • ceiling(arg): ceiling of the number argument arg. • floor(arg) • number(arg): convert arg to number. • round(arg): • sum(arg): sum of values of the node set argument arg. • concat(): string concatenation of arguments. • contains(arg1. arg2): true iff arg1 contains arg2.

XPath 1.0 Functions • normalize-space(arg): returns the string argument arg with white space stripped. • starts-with(arg1, arg2): whether arg1 starts with arg2. • string(): convert to string. • string-length(arg): the number of characters of the string arg. • substring(arg1, arg2, arg3): returns the substring of arg1 that starts with the index arg2 for a length of arg3.

XPath 1.0 Functions • substring-after(arg1, arg2): the substring of arg1 after arg2. • substring-before(): the substring of arg1 after arg2. • translate(arg1, arg2, arg3): returns arg1 with each character in arg2 translated to the corresponding characters in arg3.

XPath 1.0 Classwork • To be handed in the class. • Use Familytree.xml

XPath 2.0 • W3C related specifications: • XQuery 1.0 and XPath 2.0 Data Model • XQuery 1.0 and XPath 2.0 Functions and Operators • XQuery 1.0 and XPath 2.0 Formal Semantics • XML Path Language (XPath) 2.0 • XSL Transformations (XSLT) Version 2.0 • XSLT 2.0 and XQuery 1.0 Serialization • XQuery 1.0: An XML Query Language

Major Changes in XPath 2.0 • Sequences to replace node-sets as the main data model. • XML Schema data types • Variable binding • A rich set of functions • Richer expressions • New comment styles • …

Sequences and items • A sequence is an orderedheterogeneous collection of items. • An item can be • A node • An atomic value

Sequences Example: (1, 5 to 8, "Bun Yue", 2.1) (1+2, 5) (1 to 50)[. mod 3 = 1] /* | //person (1, 2, (3, (4, 5))) is (1,2,3,4,5)

Sequences • Items within a sequence • Can be in any arbitrary order. • Can be heterogeneous. • Can be repeating. • Sequences are not nested. • XPath 2.0 results are sequences. Atomic values are considered to be sequences with a single item.

For expression & variable binding • for $varname in (expression) return (expression) Example: for $person in //person return count($person/email) for $person in //person return fn:count($person/email)

If statement Example: if (//person[first/text()='Boris']) then 'found Boris' else 'no Boris'

XPath 2.0 Functions • Many new functions: http://www.w3schools.com/XPath/xpath_functions.asp • Some categories: • Sequences • Aggregate functions • Nodes • Numeric • String, with regular expressions

Quantified Expressions • Applied to a sequence: • some • every • Format: • some $v in sequence satisfies condition • every $v in sequence satisfies condition

Example if (every $person in //person satisfies $person/email) then "everyone has email address" else "oh oh"

Classwork • To be handed in the class. • Use Familytree.xml

Questions

Introduction to XPath