300 likes | 584 Views
XPath. XML Path Language. Outline. XML Path Language (XPath) Data Model Description Node values XPath expressions Relative expressions Simple subset of XPath Predicates Node-Set Functions Full location Steps Axes Node Tests Abbreviated syntax Links to more information .
E N D
XPath XML Path Language
Outline • XML Path Language (XPath) • Data Model Description • Node values • XPath expressions • Relative expressions • Simple subset of XPath • Predicates • Node-Set Functions • Full location Steps • Axes • Node Tests • Abbreviated syntax • Links to more information
XML Path Language (XPath) • XPath 1.0 is a W3C Recommendation (16 November 1999) • used for addressing elements (in XPointer) • used for matching elements (in XSLT and XQuery) • declarative language for specifying paths in trees • syntax somewhat similar to that used for paths in file hierarchies
Data Model - example document • document is viewed as a tree of nodes • e.g. document <Book> <chapter> <heading>The First Chapter</heading> <section>... ...</section> </chapter> <chapter> <heading>The Second Chapter</heading> <section>... ...</section> <section>... ...</section> </chapter> </Book>
Data Model Description • 6 types of node: • root, element, attribute, text, comment, processing instruction • root of tree is different from (and parent of) root element of the document (Book in example) • in example • root node is red • element nodes are yellow • text nodes are green • element nodes have associated set of attribute nodes • attribute nodes are not children of element nodes • order of child element nodes is significant
Data Model - example document • More complex document <CD publisher="Deutsche Grammophon" length="PT1H13M37S" > <composer>Johannes Brahms</composer> <performance> <composition>Piano Concerto No. 2</composition> <soloist>Emil Gilels</soloist> <orchestra>Berlin Philharmonic</orchestra> <conductor>Eugen Jochum</conductor> </performance> <performance> <composition>Fantasias Op. 116</composition> <soloist>Emil Gilels</soloist> </performance> </CD>
Node values • each attribute and element node has a value • e.g., value of length attribute node is PT1H13M37S • value of an element node is concatenation of all text node descendants • e.g., value of composer element node is Johannes Brahms • e.g., value of second performance element node is Fantasias Op. 116 Emil Gilels • e.g., value of CD element node is Johannes Brahms Piano Concerto No. 2 Emil Gilels Berlin Philharmonic Eugen Jochum Fantasias Op. 116 Emil Gilels • note : attribute values are not included
XPath expressions • an XPath expression is either • an absolute expression or • a relative expression • an absolute expression • starts with '/' • is followed by a relative expression • and is evaluated starting at the root node • a relative expression is • a sequence of location steps • each separated by '/' • example (absolute expression comprising 2 steps): /CD/composer
Relative expressions • relative expression is evaluated with respect to an initial context (set of nodes) • initial context is defined externally (by XPointer or XSLT) <xsl:template match="CD"> <xsl:value-of select="composer"/> </xsl:template> context for composer given by CD • each location step • is evaluated with respect to some context • produces a set of nodes which • provides the context for the next location step
Simple subset of XPath • subset uses abbreviated syntax • a location step has one of 3 forms: • it is empty, i.e., // • element-namepredicates • @attribute-name predicates • an empty step means search all descendants of the context node • element-name means find all child elements of the context node which have the given name • @attribute-name means find the attribute node of the context node which has the given name • optional predicates (each enclosed in [ and ]) filter nodes found further
Examples – cd.xml <?xml version="1.0" ?> <CDlist> <CD> <composer>Johannes Brahms</composer> <soloist>Emil Gilels</soloist> <orchestra>Berlin Philharmonic</orchestra> <conductor>Eugen Jochum</conductor> <date>1972</date> <performance> <composition>Piano Concerto No. 1</composition> </performance> <publisher>Deutsche Grammophon</publisher> <number>419159-2</number> </CD> <CD> <soloist>Martha Argerich</soloist> <orchestra>London Symphony Orchestra</orchestra> <conductor>Claudio Abbado</conductor> <date>1968</date> <performance> <composer>Frederic Chopin</composer> <composition>Piano Concerto No. 1</composition> </performance> <performance> <composer>Franz Liszt</composer> <composition>Piano Concerto No. 1</composition> <conductor>Antal Dorati</conductor> <date>1984</date> </performance> <publisher>Deutsche Grammophon</publisher> <number>449719-2</number> </CD> </CDlist>
Examples – cd.xml • /CDlist/CD • all child CD elements of the CDlist element that is the child of the root • //composer • all composer elements that are descendants of the root • //performance/composer • all composer child elements of performance elements which are descendants of the root • //performance[composer] • all performance elements that have a composer element as a child • //CD[performance/date] • all CD elements that have a performance element as a child that has a date element as a child • //performance[conductor][date] • all performance elements that have both conductor and date elements as children
Predicates • predicates filter out more nodes from a node-set S • evaluate predicate on each node x in node-set S with • x as the context node • the size of S as the context size • the position of x in S as the context position • predicate comprises • Boolean expressions: using and, or, not, =, ... • numerical expressions: using +, -, ... • node-set expressions: location paths filtered by predicates • node-set functions
Node-Set Functions • last(): returns context size • position(): returns context position • count(S): returns number of nodes in S • name(S): returns name of first node in S • id(S): returns nodes who have an ID-type attribute with a value in S • e.g. • position()=2: true if node is 2nd in the context • position()=last(): true if node is last in the context
Examples • count(//performance): the number of performance elements • //performance[not(date)]: performance elements that do not have a date element as a child • all CD elements that have "Deutsche Grammophon" as publisher and have more than 1 performance element as child: //CD [publisher="Deutsche Grammophon" and count(performance) > 1] • or //CD [publisher="Deutsche Grammophon"] [count(performance) > 1] • or //CD [count(performance) > 1] [publisher="Deutsche Grammophon"]
More examples • //CD/performance[position()=2] • returns the second performance of each CD • //CD/performance[position()=2][date] • returns the second performance of each CD if it has a date (otherwise, returns nothing) • //CD/performance[date and position()=2] • returns the same • //CD/performance[date][position()=2] • returns the second of those performance children of each CD that have a date (if any)
Full location Steps • using full, not abbreviated, syntax • a location step has the form axis :: node-test predicates where • axis selects a set of candidate nodes • node-test filters candidates based on node type or name • optional predicates • in child::CD[attribute::publisher="Deutsche Grammophon"] • child and attribute are axes • CD and publisher are node-tests
Axes • axis specifies what nodes, relative to context node(s), to consider • there are 13 axes defined • self: the context node itself • parent: the parent of the context node (note: parent of root is empty) • attribute: all attributes of the context node • namespace: all namespace nodes of the context node • child, ancestor, descendant (see later) • ancestor-or-self: ancestors and the context node • descendant-or-self: descendants and the context node • preceding-sibling, following-sibling, preceding, following (see later)
Axes: parent, child, ... • context node (and self axis) in yellow • nodes in parent axis in black • nodes in child axis in white • nodes in preceding-sibling axis in green • nodes in following-sibling red
Axes: ancestor, descendant, ... • context node C (and self axis) in yellow • ancestor (black): elements whose start tag precedes start tag of C and whose end tag follows end tag of C • descendant (white): elements whose start tag follows start tag of C and whose end tag precedes end tag of C • preceding (green): elements whose end tag precedes start tag of C • following (red): elements whose start tag follows end tag of C • preceding, following, ancestor, descendant and self together partition the tree into 5 subtrees
Node Tests • axes other than attribute and namespace include elements, text nodes, comments and processing instructions • principal type of these axes is element • node test further restricts nodes considered • by node name • chapter: nodes with name "chapter" • *: nodes with any name (of the axis principal type) • by node type • node(): all nodes • text(): character data nodes • comment(): comment nodes • processing-instruction(): processing instruction nodes
Examples • child::*[position()=2] • second child element • descendant::node() • all descendant nodes (elements, text nodes, comments or processing instructions) • following-sibling::*[position()=last()] • rightmost sibling element • child::section[position()=2]/child::subsection[position()=1] • first subsection of the second section
Abbreviated syntax • if path starts with //, initial context is the root
Examples using abbreviations • *[2] • second child element • //* • all descendant elements of the root • //text() • all text node descendants of the root • section[2]/subsection[1] • first subsection of the second section • .//@href • all href attributes in descendants of context node(s) • //section[.//image]/title • the titles of sections which contain images
The family example <?xml version="1.0"?> <family> <parent pno="p1" role="mother" spouse="p2"> <name>Janet</name> </parent> <parent pno="p2" role="father" spouse="p1"> <name>John</name> </parent> <child cno="c1" siblings="c2 c3"> <name>Tom</name> </child> <child cno="c2" siblings="c1 c3"> <name>Dick</name> </child> <child cno="c3" siblings="c1 c2"> <name>Harry</name> </child> </family>
Location paths on family • in the context of the family element, • parent[@spouse='p2']/name produces <name>Janet</name> the name of the person whose spouse attribute has value 'p2' • parent[id(@spouse)[name='John']]/name produces <name>Janet</name> the name of the person whose spouse is named John
More location paths on family • in the context of the family element, • id(child[name='Dick']/@siblings)/name produces <name>Tom</name> <name>Harry</name> the names of Dick's siblings • child[id(@siblings)[name='Tom']]/name produces <name>Dick</name> <name>Harry</name> the names of the children who have Tom as a sibling
More Information • www.w3.org/TR/xpath W3C's Recommendation on XPath • www.vbxml.com/xpathvisualizer/ page for downloading the XPath visualiser • www.w3schools.com/xpath/ XPath tutorial • www.vbxml.com/xsl/tutorials/intro/default.asp XSLT and XPath tutorial