160 likes | 175 Views
Learn how XPath is used to locate specific data elements within an XML document and provide contextual details for processing programs. Understand the tree structure of XML and how to navigate it using XPath.
E N D
Chapter 6 - XPath & XPointerLearning XMLbyErik T. Ray Slides were developed by Jack DavisCollege of Information Scienceand TechnologyRadford University
XPath • XML is often compared to a database because of the way it structures information for easy retrieval. With a little knowledge of the markup language, you can locate and extract any piece of information. • XPath is used to locate a specific data element from a known location within a document and it can be used to provide contextual details to a processing program. For example, one could specify that items in a list should use a particular kind of bullet specified in a metadata section at the beginning of the document. • Every XML document can be represented graphically as a tree structure. Because there is only one possible tree configuration for any given document, there is a unique path from the root to any other point. XPath describes how to climb the tree in a series of steps.
XML Trees • Each step in a path touches a branching or terminal point in the tree called a node (remember these are the elements). A terminal node is called a leaf (no descendants). • There are seven different kinds of nodes:- root The root of the document is a special kind of node. It's not an element, but rather it contains the document element. It also contains any comments or processing instructions that surround the document element.- element Elements and the root node share a special property among nodes: they alone can contain other nodes. An element node can contain other elements. In a tree, this is the point where branches meet. Empty elements are leaf nodes.
XML Trees (cont.) - attributeXPath treats attributes as separate nodes from their element hosts. This allows you to select the element as a whole or the only the attribute in that element using the same path syntax. An attribute is like an element that contains only text.- textA region of uninterrupted text is treated as a leaf node. It is always the child of an element. An element may have more than one text node cheld, however, if it is broken up by elements or other node types. Keep in mind, if you process text in an element: you may have to check more than one node.- commentAn XML comment is considered a valid node. - processing instructionLike comments, a processing instruction can appear anywhere in the document under the root.
XML Trees (cont.) • NamespaceA namespace is actually a region of the document not just the possession of a single element. All the descendants of that element will be affected. XML processors must pay special attention to namespaces, so XPath makes it a unique node type.The DTD is not included in the list of nodes. XPath maintains the structure and content of a document so that document could be reconstructed almost exactly. (The order of the attributes might change, but since order of attributes is not significant in XML, a semantically equivalent document can be produced.In XML any node in the tree can be thought of as a tree in its own right (subtree). Trees facilitate recursive programming. XSLT is elegant because a rule treats every element as a tree.
Finding Nodes • XPath uses chains of steps called a location path to define the location of elements in an XML tree. A location step has three parts:- an axis that describes the direction to travel, - a node test that specifies what kinds of nodes'- and a set of optional predicates that use Boolean tests to reduce the candidates. • The axis is a keyword that specifies a direction you can travel from any node. You can go up through ancestors, down through descendants, or linearly through siblings.
Node Axes • After the axis comes a node test parameter, joined to the axis by a double colon (::). A name can be used in place of an explicit node type, in which case the node type is inferred from the axis. For the attribute axis - attribute, namespace axis - namespace, all other axes the node is assumed to be an element. In the absence of a node axis specifier, the axis is assumed to be child and the node is assumed to be of type element.
Location Paths • Location path steps are chained together using the slash (/) character. Each step gets you a little closer to the node you want to locate. It's like giving directions. For example, to get from the root node to a para element inside a section inside a chapter inside a book, a path might look like this:book/chapter/section/paraXPath defines some handy shortcuts.@role matches an attribute named role. the context node./* matches the document element. Any path that starts with a / is an absolute path, the next step is * which matches any elementparent::*/following-sibling::para matches all paras that follow the parent of the context node.
Location Paths (cont.) .. matches the parent node. The double dot is shorthand for parent::node().//para matches any element of type para that is a descendent of the current node. The // is shorthand for /descendant-or-self::node()//para matches any para descending from the root node. It matches all paras anywhere in the document. A location path starting with a // is assumed to begin at the root.../* matches all sibling elements (and the context node if it is an element)
Examples • Look at the document example in 6-1 which is a sample XML document. • Here's some location path examples/quotelist/child::node() matches all the quotation elements plus the XML comment/quotelist/quotation matches all the quotation elements/*/* matches all the quotation elements//comment()/following-sibling::*/@style matches the style attribute of the last quotation elementid('q2')/.. matches the document element
Predicates • If the axis and node type aren't sufficient to narrow down the selection, you can use one or more predicates (Boolean expressions). Every node that passes this test (in addition to the node test and axis specifier) is included in the final node set. Nodes that fail the test are not. • Examples//quotation[@id="q3"]/text text element in the third quotation element./*/*[position()=last()] last quotation element. The position() function equals the position of the most recent step among eligible candidates. The function last() is equal to the total number of candidates (in this case 5)//quotation[@style='silly' or @style='wise'] The first, third, and fourth quotation elements. The or keyword is a Boolean op.
Boolean Expressions • XPath contains a full set of comparison operators to compare strings or numbers. • There are also node set expressions. The expression evaluates to a set of nodes. This is a set in the strict mathematical sense, meaning that it contains no duplicates. The same node can be added many times, but the set will always contain only one copy of it. • Node Set functionscount (node set) returns the no. of nodesgenerate-id(node set) string containing a unique identifier for the first node in node set, or for the context node if the argument is left out. This string is generated by the processor and guaranteed to be unique for each node.last() the number of the node in the context node set
Node Set Functions (cont.) local-name(node set) name of the first node in node set, without the namespace prefix.name (node set) name of the first node in node set including the namespace prefixnamespace-url(node set) the URI of the namespace for the first node in node set, without an argument it returns the namespace URI for the context node.position() the number of the context node in the context node set.There are also functions that create node sets, pulling together nodes from all over the document.