1 / 48

Introduction to XPath: Query Language for XML

Learn about the XPath query language for XML, its applications, and its relationship with XQuery and XSLT. Explore the different node types, location steps, axes, node tests, and predicates used in XPath.

marianneh
Download Presentation

Introduction to XPath: Query Language for XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. /*/*/self::*XPath Dongwon Lee, Ph.D. IST 516 Fall 2011

  2. XPath • Path-based XML query language • V1.0 – 1999: http://www.w3.org/TR/xpath • V2.0 – 2003: http://www.w3.org/TR/xpath20/ • Functional, strongly-typed query language http://www.w3schools.com/xpath/xpath_intro.asp

  3. Apps of XPath • XQuery: a full-blown query language for XML • for $x in doc("books.xml")/bookstore/bookwhere $x/price>30order by $x/titlereturn $x/title • XPointer/XLink: a standard way to create hyperlinks in XML • <book title="Harry Potter">  <description  xlink:type="simple"  xlink:href="http://book.com/images/HPotter.gif"  xlink:show="new">  As his fifth year at Hogwarts School of Witchcraft and  Wizardry approaches, 15-year-old Harry Potter is.......  </description></book>

  4. Apps of XPath • XSLT: a style sheet language of XML that can transform XML from one to another format <xsl:stylesheet version="1.0” xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/">      <xsl:for-each select="catalog/cd">        <tr>          <td><xsl:value-of select="title"/></td>          <td><xsl:value-of select="artist"/></td>        </tr>      </xsl:for-each> </xsl:template></xsl:stylesheet>

  5. XML Model Trees Hierarchy Order Relational Model Tables Flat Orderless (except ORDER-BY) XPath vs. SQL XPath SQL

  6. XML Model Trees Hierarchy Order Can do all XPath does but not vice versa Turing-Complete general purpose PL Can retrieve, update, and transform XML data FLWOR expression XPath vs. XQuery XPath XQery

  7. XPath Expression • Expression (basic building block) returns one of the 4 objects: • node-set (an unordered collection of nodes without duplicates) • boolean (true or false) • number (a floating-point number) • string (a sequence of characters) . . .

  8. XPath Nodes processing-instruction • <?xml version="1.0” encoding="UTF-8”?> • <notexmlns="http://pike.psu.edu"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation=“note.xsd”> • <to>Tove</to> • <!-- <from>Jani</from> --> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note> • Nodes: 7 types • element, attribute, text, namespace • processing-instruction, comment, document namespace document comment text

  9. Location Step • Location Steps are evaluated in order from left to right • Absolute: /step/step/… • Relative: step/step/… • Axis: Specifies the node relationship • Node Test: specifies node type and name • Predicate: Instructions to filter nodes Preferred – Faster to evaluate axis :: node-test [predicate]

  10. 1. Axis • / selects the root of the node hierarchy • <document/> as the default root of XML document • Forward Axis • child::, descendent::, attribute::, self::, descendent-or-self::, following-sibling::, following:: • Backward Axis • ancestor::, preceding-sibling::, preceding::, ancestor-or-self:: • Relative to the current context (Axis::context) • child::emp: “emp” is the child element of current node • attribute::date: “date” is the attribute of current node

  11. Courses Parent Ancestors Self Undergrad Graduate Sibling Descendants Room Instructor Child Name Office Phone Grandchild Node Relationships

  12. 1. Axis Abbreviation • Descendent-or-self::node()  // • child::  / • attribute::  @ • self::node()  . • parent::  .. • Eg • /child::doc/descendent::chapter  /doc//chapter • //doc/attribute::type  //doc/@type

  13. 2. Node Test • node(): matches all nodes • text(): matches all text nodes • ElementName: matches all elements of type ‘ElementName’ • *: matches all elements • @*: matches all attributes

  14. 2. Node Test • * (wildcard) is often used to match unknown XML elements • /catalog/cd/*: all the child elements of all the cd elements of the catalog element • /* : all children of the root <document/> • /*/*: all grandchildren of the root <document/> • //*: all elements of the XML document

  15. 3. Predicate • Path-expresson[ filtering condition ]  Path-expression that satisfies the filtering condition • Eg • //doc [@type=‘PDF’] finds all <doc> elements whose attribute “type” values are ‘PDF’ • This returns <doc> elements, not its attributes “type” • Filtering condition does not affect the returned answers (ie, projection) of XPath • It just adds more constraints to satisfy

  16. Location Step Examples

  17. Examples of usage

  18. IST Example What IST Classes are in Room IST 110? /Courses/*[child::Room=‘110 IST’]

  19. IST Example What IST courses have TA’s? /Courses/*/TA/parent::*

  20. IST Example What rooms are used by IST courses? /Courses/*/Room/text()

  21. Comparison • Comparison can be performed using • =, !=, <=, <, >=, and > • Examples • [child::Room != ‘205 IST’] • [child::Time > 1220] NOTE: When used within Predicate, Child::Room == Child::Room/text()

  22. Math Operators • + : performs addition • - : performs subtraction • * : performs multiplication • div : performs division • mod : returns the remainder of division • Examples: • [child::Time mod 100 = 30]

  23. Node Functions • last() : returns the numeric position of the last node in a list • position() : returns the numeric position of the current node • count() : returns the number of nodes in a list • name(): returns the name of a node • id() : selects elements by their unique ID

  24. Node Function Example Which courses have more than 2 child elements? /Courses/*[count(child::*)>2]

  25. String Functions • concat(string, string) : concatenates the string arguments • starts-with(string, string) : returns true if the first string starts with the second string • contains(string, string) : returns true if the first string contains the second string • Eg, • concat(‘sh’, ‘oe’) = ‘shoe’ • starts-with(‘cat’, ‘ca’) = true • contains(‘puppy’, ‘upp’) = true

  26. String Functions • substring(string, number, [number]) : returns a substring of the provided string • string-length(string) : returns the number of characters in the string • Eg, • substring(‘chicken’, 3, 4) = ‘icke’ • substring(‘chicken’, 3) = ‘icken’ • string-length(‘cat’) = 3

  27. String Functions Examples • //Book [starts-with(child::Title, “X”)] / price • //Book [string-length(Author/FN)=3] / Title <Catalog> <Book> <Title>XML</> <Price>19.9</> <Author> <FN>Joe</> </Author> </Book> <Book> <Title>XSLT</> <Price>22.9</> <Author> <FN>HJ</><LN>Kyle</> </Author> </Book> </Catalog>

  28. Number Functions • sum(node-set) : returns the sum of values for each node in a node set • Eg, sum(//@price) • floor(number) : returns the largest integer that is not greater than the argument • Eg, floor(2.6) = 2 • ceiling(number) : returns the smallest integer that is not less than the argument • Eg, ceiling(2.6) = 3 • round(number) : returns the closest integer to the argument • Eg, round (2.4) = 2

  29. Boolean OPs in XPath • Conjunction: “and” • //Product[@price>10.8 and @year>2000] • Disjunction: ““or” • /Customer[@cname=‘Lee’ or @cid>100] • Disjunction: “|” • Compute both node-sets and return the union • //Book | //Tape • NOTE: some XPath engines currently support only either “|” or “or” disjunction

  30. /AAA/CCC <AAA>           <BBB/>           <CCC/>           <BBB/>           <BBB/>           <DDD>                <BBB/>           </DDD>           <CCC/>      </AAA> /AAA/DDD/BBB <AAA>           <BBB/>           <CCC/>           <BBB/>           <BBB/>           <DDD>                <BBB/>           </DDD>           <CCC/>      </AAA> XPath Lab [www.zvon.org]

  31. //BBB <AAA>           <BBB/>           <CCC/>           <BBB/>           <BBB/>           <DDD>                <BBB/>           </DDD>           <CCC/>      </AAA> /AAA/* <AAA>           <BBB/>           <CCC/>           <BBB/>           <BBB/>           <DDD>                <BBB/>           </DDD>           <CCC/>      </AAA> XPath Lab [www.zvon.org]

  32. /AAA/BBB[1] <AAA>           <BBB/>           <CCC/>           <BBB/>           <BBB/>           <DDD>                <BBB/>           </DDD>           <CCC/>      </AAA> /AAA/BBB[last()] <AAA>           <BBB/>           <CCC/>           <BBB/>           <BBB/>           <DDD>                <BBB/>           </DDD>           <CCC/>      </AAA> XPath Lab [www.zvon.org]

  33. /AAA//BBB[1] <AAA>           <BBB/>           <CCC/>           <BBB/>           <BBB/>           <DDD>                <BBB/>           </DDD>           <CCC/>      </AAA> /AAA//BBB[last()] <AAA>           <BBB/>           <CCC/>           <BBB/> <BBB/>           <DDD> <BBB/>           </DDD>           <CCC/>      </AAA> XPath Lab [www.zvon.org] Position=3 Position =1

  34. Position Explanation • “/AAA//BBB” returns two lists: • Three <BBB> as the children of <AAA> • One <BBB> as the grandchild of <AAA> • Then, position like [1] or [2] applies predicate to answers in each list SEPARATELY • /AAA//BBB[1] returns both: • First <BBB> from the first list -- a child of <AAA> • First <BBB> from the second list -- a grandchild of <AAA> • /AAA//BBB[last()] however returns nothing • last() returns the position of the last node in a list • But there are two lists here and can’t pick which

  35. //@id <AAA>  <BBB id = "b1"/>  <BBB id = "b2"/>  <BBB name = "bbb"/>  <BBB/> </AAA> //BBB[@id=“b2”] <AAA>  <BBB id = "b1"/>  <BBB id = "b2"/>  <BBB name = "bbb"/>  <BBB/> </AAA> XPath Lab [www.zvon.org]

  36. //*[count(BBB)=2] <AAA>           <CCC>                <BBB/>                <BBB/>                <BBB/>           </CCC>           <DDD>                <BBB/>                <BBB/>           </DDD>           <EEE>                <CCC/>                <DDD/>           </EEE>  </AAA> //*[count(*)=3] <AAA>           <CCC>                <BBB/>                <BBB/>                <BBB/>           </CCC>           <DDD>                <BBB/>                <BBB/>           </DDD>           <EEE>                <CCC/>                <DDD/>           </EEE>  </AAA> XPath Lab [www.zvon.org]

  37. XPath Evaluation S/W • Many S/W have built-in support for XPath 1.0 and 2.0 now • Eg, • XPath Visualizer: Windows only • http://xpathvisualizer.codeplex.com/ • XMLSpy: Windows only • <oXygen/>: Mac and Windows • XMLPad: Windows only

  38. #1. XPath Visualizer Answer #2 for //letter/paragraph Answer #1 for //letter/paragraph Minor bug here

  39. #2. XMLSpy Choose Evaluate XPath

  40. #2. XMLSpy Answer #1 for //letter/paragraph

  41. #2. XMLSpy Answer #2 for //letter/paragraph

  42. #3. <Oxygen/> Press Enter key Answer #1 for //letter/paragraph

  43. #3. <Oxygen/> Answer #2 for //letter/paragraph

  44. #4 XMLPad

  45. XPath Evaluation in Programming • XPath Engines / Libraries • Apache Xalan-Java: http://xml.apache.org/xalan-j/ • Saxon: http://saxon.sourceforge.net/ • Jaxen: http://jaxen.codehaus.org/ • PL specific APIs • Java: package javax.xml.xpath + DOM • PHP: domxml’s xpath_eval() (v4), SimpleXML (v5)

  46. Eg. XPath in JAVA public Node findAddress(String name, Document source) throws Exception { // need to recreate a few helper objects XMLParserLiaison xpathSupport = new XMLParserLiaisonDefault(); XPathProcessor xpathParser = new XPathProcessorImpl(xpathSupport); PrefixResolver prefixResolver = new PrefixResolverDefault(source.getDocumentElement()); // create the XPath and initialize it XPath xp = new XPath(); String xpString = "//address[child::addressee[text() = '” +name+"']]"; xpathParser.initXPath(xp, xpString, prefixResolver); // now execute the XPath select statement XObject list = xp.execute(xpathSupport, source.getDocumentElement(), prefixResolver); return list.nodeset().item(0); } http://www.javaworld.com/javaworld/jw-09-2000/jw-0908-xpath.html?page=3

  47. Eg. SimpleXML in PHP <?php    $xml = simplexml_load_file('employees.xml');    echo "<strong>Using direct method...</strong><br />";    $names = $xml->xpath('/employees/employee/name');    foreach($names as $name) {        echo "Found $name<br />";    } echo "<br />";    echo "<strong>Using indirect method...</strong><br />";    $employees = $xml->xpath('/employees/employee');    foreach($employees as $employee) {        echo "Found {$employee->name}<br />";    } echo "<br />";    echo "<strong>Using wildcard method...</strong><br />";    $names = $xml->xpath('//name');    foreach($names as $name) {        echo "Found $name<br />";    }?> http://www.tuxradar.com/practicalphp/12/3/3

  48. Lab #2 (DUE: Sep. 25 11:55PM) • https://online.ist.psu.edu/ist516/labs • Tasks: • Individual Lab • Using an XML files, practice XPath queries • Turn-In • XPath queries and English interpretation • Screenshot of results of XPath queries

More Related