480 likes | 490 Views
Learn about the XPath query language for XML, its applications, and its relationship with XQuery and XSLT. Explore the different node types, location steps, axes, node tests, and predicates used in XPath.
E N D
/*/*/self::*XPath Dongwon Lee, Ph.D. IST 516 Fall 2011
XPath • Path-based XML query language • V1.0 – 1999: http://www.w3.org/TR/xpath • V2.0 – 2003: http://www.w3.org/TR/xpath20/ • Functional, strongly-typed query language http://www.w3schools.com/xpath/xpath_intro.asp
Apps of XPath • XQuery: a full-blown query language for XML • for $x in doc("books.xml")/bookstore/bookwhere $x/price>30order by $x/titlereturn $x/title • XPointer/XLink: a standard way to create hyperlinks in XML • <book title="Harry Potter"> <description xlink:type="simple" xlink:href="http://book.com/images/HPotter.gif" xlink:show="new"> As his fifth year at Hogwarts School of Witchcraft and Wizardry approaches, 15-year-old Harry Potter is....... </description></book>
Apps of XPath • XSLT: a style sheet language of XML that can transform XML from one to another format <xsl:stylesheet version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"> <xsl:for-each select="catalog/cd"> <tr> <td><xsl:value-of select="title"/></td> <td><xsl:value-of select="artist"/></td> </tr> </xsl:for-each> </xsl:template></xsl:stylesheet>
XML Model Trees Hierarchy Order Relational Model Tables Flat Orderless (except ORDER-BY) XPath vs. SQL XPath SQL
XML Model Trees Hierarchy Order Can do all XPath does but not vice versa Turing-Complete general purpose PL Can retrieve, update, and transform XML data FLWOR expression XPath vs. XQuery XPath XQery
XPath Expression • Expression (basic building block) returns one of the 4 objects: • node-set (an unordered collection of nodes without duplicates) • boolean (true or false) • number (a floating-point number) • string (a sequence of characters) . . .
XPath Nodes processing-instruction • <?xml version="1.0” encoding="UTF-8”?> • <notexmlns="http://pike.psu.edu"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation=“note.xsd”> • <to>Tove</to> • <!-- <from>Jani</from> --> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note> • Nodes: 7 types • element, attribute, text, namespace • processing-instruction, comment, document namespace document comment text
Location Step • Location Steps are evaluated in order from left to right • Absolute: /step/step/… • Relative: step/step/… • Axis: Specifies the node relationship • Node Test: specifies node type and name • Predicate: Instructions to filter nodes Preferred – Faster to evaluate axis :: node-test [predicate]
1. Axis • / selects the root of the node hierarchy • <document/> as the default root of XML document • Forward Axis • child::, descendent::, attribute::, self::, descendent-or-self::, following-sibling::, following:: • Backward Axis • ancestor::, preceding-sibling::, preceding::, ancestor-or-self:: • Relative to the current context (Axis::context) • child::emp: “emp” is the child element of current node • attribute::date: “date” is the attribute of current node
Courses Parent Ancestors Self Undergrad Graduate Sibling Descendants Room Instructor Child Name Office Phone Grandchild Node Relationships
1. Axis Abbreviation • Descendent-or-self::node() // • child:: / • attribute:: @ • self::node() . • parent:: .. • Eg • /child::doc/descendent::chapter /doc//chapter • //doc/attribute::type //doc/@type
2. Node Test • node(): matches all nodes • text(): matches all text nodes • ElementName: matches all elements of type ‘ElementName’ • *: matches all elements • @*: matches all attributes
2. Node Test • * (wildcard) is often used to match unknown XML elements • /catalog/cd/*: all the child elements of all the cd elements of the catalog element • /* : all children of the root <document/> • /*/*: all grandchildren of the root <document/> • //*: all elements of the XML document
3. Predicate • Path-expresson[ filtering condition ] Path-expression that satisfies the filtering condition • Eg • //doc [@type=‘PDF’] finds all <doc> elements whose attribute “type” values are ‘PDF’ • This returns <doc> elements, not its attributes “type” • Filtering condition does not affect the returned answers (ie, projection) of XPath • It just adds more constraints to satisfy
IST Example What IST Classes are in Room IST 110? /Courses/*[child::Room=‘110 IST’]
IST Example What IST courses have TA’s? /Courses/*/TA/parent::*
IST Example What rooms are used by IST courses? /Courses/*/Room/text()
Comparison • Comparison can be performed using • =, !=, <=, <, >=, and > • Examples • [child::Room != ‘205 IST’] • [child::Time > 1220] NOTE: When used within Predicate, Child::Room == Child::Room/text()
Math Operators • + : performs addition • - : performs subtraction • * : performs multiplication • div : performs division • mod : returns the remainder of division • Examples: • [child::Time mod 100 = 30]
Node Functions • last() : returns the numeric position of the last node in a list • position() : returns the numeric position of the current node • count() : returns the number of nodes in a list • name(): returns the name of a node • id() : selects elements by their unique ID
Node Function Example Which courses have more than 2 child elements? /Courses/*[count(child::*)>2]
String Functions • concat(string, string) : concatenates the string arguments • starts-with(string, string) : returns true if the first string starts with the second string • contains(string, string) : returns true if the first string contains the second string • Eg, • concat(‘sh’, ‘oe’) = ‘shoe’ • starts-with(‘cat’, ‘ca’) = true • contains(‘puppy’, ‘upp’) = true
String Functions • substring(string, number, [number]) : returns a substring of the provided string • string-length(string) : returns the number of characters in the string • Eg, • substring(‘chicken’, 3, 4) = ‘icke’ • substring(‘chicken’, 3) = ‘icken’ • string-length(‘cat’) = 3
String Functions Examples • //Book [starts-with(child::Title, “X”)] / price • //Book [string-length(Author/FN)=3] / Title <Catalog> <Book> <Title>XML</> <Price>19.9</> <Author> <FN>Joe</> </Author> </Book> <Book> <Title>XSLT</> <Price>22.9</> <Author> <FN>HJ</><LN>Kyle</> </Author> </Book> </Catalog>
Number Functions • sum(node-set) : returns the sum of values for each node in a node set • Eg, sum(//@price) • floor(number) : returns the largest integer that is not greater than the argument • Eg, floor(2.6) = 2 • ceiling(number) : returns the smallest integer that is not less than the argument • Eg, ceiling(2.6) = 3 • round(number) : returns the closest integer to the argument • Eg, round (2.4) = 2
Boolean OPs in XPath • Conjunction: “and” • //Product[@price>10.8 and @year>2000] • Disjunction: ““or” • /Customer[@cname=‘Lee’ or @cid>100] • Disjunction: “|” • Compute both node-sets and return the union • //Book | //Tape • NOTE: some XPath engines currently support only either “|” or “or” disjunction
/AAA/CCC <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> /AAA/DDD/BBB <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> XPath Lab [www.zvon.org]
//BBB <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> /AAA/* <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> XPath Lab [www.zvon.org]
/AAA/BBB[1] <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> /AAA/BBB[last()] <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> XPath Lab [www.zvon.org]
/AAA//BBB[1] <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> /AAA//BBB[last()] <AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA> XPath Lab [www.zvon.org] Position=3 Position =1
Position Explanation • “/AAA//BBB” returns two lists: • Three <BBB> as the children of <AAA> • One <BBB> as the grandchild of <AAA> • Then, position like [1] or [2] applies predicate to answers in each list SEPARATELY • /AAA//BBB[1] returns both: • First <BBB> from the first list -- a child of <AAA> • First <BBB> from the second list -- a grandchild of <AAA> • /AAA//BBB[last()] however returns nothing • last() returns the position of the last node in a list • But there are two lists here and can’t pick which
//@id <AAA> <BBB id = "b1"/> <BBB id = "b2"/> <BBB name = "bbb"/> <BBB/> </AAA> //BBB[@id=“b2”] <AAA> <BBB id = "b1"/> <BBB id = "b2"/> <BBB name = "bbb"/> <BBB/> </AAA> XPath Lab [www.zvon.org]
//*[count(BBB)=2] <AAA> <CCC> <BBB/> <BBB/> <BBB/> </CCC> <DDD> <BBB/> <BBB/> </DDD> <EEE> <CCC/> <DDD/> </EEE> </AAA> //*[count(*)=3] <AAA> <CCC> <BBB/> <BBB/> <BBB/> </CCC> <DDD> <BBB/> <BBB/> </DDD> <EEE> <CCC/> <DDD/> </EEE> </AAA> XPath Lab [www.zvon.org]
XPath Evaluation S/W • Many S/W have built-in support for XPath 1.0 and 2.0 now • Eg, • XPath Visualizer: Windows only • http://xpathvisualizer.codeplex.com/ • XMLSpy: Windows only • <oXygen/>: Mac and Windows • XMLPad: Windows only
#1. XPath Visualizer Answer #2 for //letter/paragraph Answer #1 for //letter/paragraph Minor bug here
#2. XMLSpy Choose Evaluate XPath
#2. XMLSpy Answer #1 for //letter/paragraph
#2. XMLSpy Answer #2 for //letter/paragraph
#3. <Oxygen/> Press Enter key Answer #1 for //letter/paragraph
#3. <Oxygen/> Answer #2 for //letter/paragraph
XPath Evaluation in Programming • XPath Engines / Libraries • Apache Xalan-Java: http://xml.apache.org/xalan-j/ • Saxon: http://saxon.sourceforge.net/ • Jaxen: http://jaxen.codehaus.org/ • PL specific APIs • Java: package javax.xml.xpath + DOM • PHP: domxml’s xpath_eval() (v4), SimpleXML (v5)
Eg. XPath in JAVA public Node findAddress(String name, Document source) throws Exception { // need to recreate a few helper objects XMLParserLiaison xpathSupport = new XMLParserLiaisonDefault(); XPathProcessor xpathParser = new XPathProcessorImpl(xpathSupport); PrefixResolver prefixResolver = new PrefixResolverDefault(source.getDocumentElement()); // create the XPath and initialize it XPath xp = new XPath(); String xpString = "//address[child::addressee[text() = '” +name+"']]"; xpathParser.initXPath(xp, xpString, prefixResolver); // now execute the XPath select statement XObject list = xp.execute(xpathSupport, source.getDocumentElement(), prefixResolver); return list.nodeset().item(0); } http://www.javaworld.com/javaworld/jw-09-2000/jw-0908-xpath.html?page=3
Eg. SimpleXML in PHP <?php $xml = simplexml_load_file('employees.xml'); echo "<strong>Using direct method...</strong><br />"; $names = $xml->xpath('/employees/employee/name'); foreach($names as $name) { echo "Found $name<br />"; } echo "<br />"; echo "<strong>Using indirect method...</strong><br />"; $employees = $xml->xpath('/employees/employee'); foreach($employees as $employee) { echo "Found {$employee->name}<br />"; } echo "<br />"; echo "<strong>Using wildcard method...</strong><br />"; $names = $xml->xpath('//name'); foreach($names as $name) { echo "Found $name<br />"; }?> http://www.tuxradar.com/practicalphp/12/3/3
Lab #2 (DUE: Sep. 25 11:55PM) • https://online.ist.psu.edu/ist516/labs • Tasks: • Individual Lab • Using an XML files, practice XPath queries • Turn-In • XPath queries and English interpretation • Screenshot of results of XPath queries