1 / 22

XML Data Management 5. Extracting Data from XML: XPath

XML Data Management 5. Extracting Data from XML: XPath. Werner Nutt based on slides by Sara Cohen, Jerusalem. Extracting Data from XML. Data stored in an XML document must be extracted to use it with various applications Data can be extracted by a program …

dareh
Download Presentation

XML Data Management 5. Extracting Data from XML: XPath

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Data Management 5. Extracting Data from XML: XPath Werner Nutt based on slides by Sara Cohen, Jerusalem

  2. Extracting Data from XML • Data stored in an XML document must be extracted to use it with various applications • Data can be extracted by a program … • … or using a declarative language: XPath • XPath is used extensively in other languages, e.g., • XSL • XML Schema • XQuery • Xpointer • Versions: XPath 1.0 (allows for efficient execution), XPath 2.0 (not yet widely supported)

  3. <?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="UK"> <title>Dark Side of the Moon</title> <artist>Pink Floyd</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Space Oddity</title> <artist>David Bowie</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Aretha: Lady Soul</title> <artist>Aretha Franklin</artist> <price>9.90</price> </cd> </catalog> Our XML document

  4. The XML document as a DOM tree catalog.xml catalog cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90

  5. XPath: Ideas A language of path expressions: • a document D is a tree • an expression E specifies possible paths in D • Ereturns nodes in D that can be reached from the root walking along an E-path Path expressions specify • navigation in docs • tests on nodes

  6. XPath Syntax: Path Expressions • / at the beginning of an XPath expression represents the root of the document • / between element names represents a parent-child relationship • // represents an ancestor-descendant relationship • foo element name, path has to go through an element foo, e.g., /cd • * wildcard, represents any element • @ marks an attribute

  7. XPath Syntax: Conditions and Built-Ins • [condition] specifies a condition, e.g., /cd[price < 10] • [N] position of a child, e.g., /cd[2] • contains(s1,s2) string comparison, e.g., /cd[contains(title, ″Moon″)] • name() name of an element, e.g., /*[name()="cd"] is equivalent to /cd

  8. catalog.xml catalog /catalog cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 Getting the top element of the document

  9. catalog.xml catalog /catalog/cd cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 Finding child nodes

  10. catalog.xml catalog /catalog/cd/price cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 Finding descendant nodes

  11. catalog.xml catalog /catalog/cd[price<10] cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 Condition on elements

  12. catalog.xml /catalog//title catalog //title cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 // represents any top down path in the document

  13. catalog.xml catalog /catalog/cd/* cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 * represents any element name in the document

  14. /*/* catalog.xml What do the following expressions return? //* catalog //*[price=9.90] //*[price=9.90]/* cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 * represents any element name in the document

  15. catalog.xml /catalog/cd[1] catalog /catalog/cd[last()] cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 Position based condition

  16. catalog.xml (//title | //price) catalog cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 | stands for for union

  17. catalog.xml /catalog/cd[@country=″UK″] catalog cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 @ marks attributes

  18. catalog.xml catalog /catalog/cd/data(@country) cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90 @ marks attributes

  19. How would you write: The price of the cds whose artist is David Bowie? catalog.xml catalog cd cd country cd country country USA UK UK title artist price title artist price title artist price Space Oddity Aretha: Lady Soul Dark Side of the Moon David Bowie Aretha Franklin Pink Floyd 9.90 9.90 10.90

  20. Navigational Axes (plural of “axis”) • We have discussed the following axes: • child (/) • descendant (//) • attribute (@) • These symbols are actually shorthands, e.g., /cd//price is the same as child::cd/descendant::price • There are additional shorthands, e.g., • self (/.) • parent (/..)

  21. ancestor Contains all ancestors (parent, grandparent, etc.) of the current node ancestor-or-self Contains the current node plus all its ancestors (parent, grandparent, etc.) descendant-or-self Contains the current node plus all its descendants (children, grandchildren, etc.) following Contains everything in the document after the closing tag of the current node following-sibling Contains all siblings after the current node preceding Contains everything in the document that is before the starting tag of the current node preceding-sibling Contains all siblings before the current node Additional Axes

  22. Info and Tools You will find more info in the next lecture and: • XPath 1.0 specification at W3C (there is also XPath 2.0, which is not yet widely supported) • XPath tutorial at W3Schools • Mulberry XPath Quick Reference Tools for our course • XPath plugin for Eclipse • Saxon XSLT and XQuery Processor • Kernow front end for Saxon (I’ll let you know the code for unlocking it) • XMLQuire XML and XPath Editor and Visualizer

More Related