340 likes | 353 Views
Learn the basics of XML, XSLT, and XPath with this training program. Understand the concepts and syntax of these technologies to build structured XML documents.
E N D
Product Training Program XML / XPath / XSLT Primer
What are XML, XSLT, and XPath? XML is the Extensible Markup Language. It is a specification for defining a document structure for organizing information. A specific XML document is said to have a Schema, or a defined structure that it is expected to follow. Specific schemas depend entirely on the intended use case of an individual XML document. XSLT is the Extensible Stylesheet Language. It is used to build an instruction stylesheet that tells a transformation engine how to convert one XML document into another XML document. This is useful because it can allow data from the first XML document to be placed within another XML document of a completely different structure. XPath, or XML-Path, is a mechanism for defining a path to a node within an XML document. XSLT uses XPath to specify the location of specific data in the source XML document when assigning values to nodes in the target XML document.
Coding Not Required, Understanding Is PilotFish’s Data Mapper abstracts away the heavy lifting of memorizing a programming language and writing complex syntax. It does this through a provided graphical tool that allows the XSLT documents to be built through simple drag-and-drop operations. While coding is not required, an understanding of the XSLT concepts described here is, in order to know how to build the structures in the graphical tool. This guide will go in depth into how XSLT is coded. Every coding concept covered here has a graphical tool available in the Data Mapper to perform the same functionality.
Sample Document All upcoming slides will reference the previous sample XML document.
XML Document Structure • The data in XML is organized into a series of Nodes arranged in a clear hierarchy, with each node having a single parent, and an unlimited number of possible siblings and children. • There are three main types of Nodes in XML: • Element nodes are the “primary” nodes, represented by angle brackets (<>). Example: <FirstName> • Attribute nodes are within element nodes, represented by a ‘name=“value”’ syntax. Example: <Fish index=“1”> • Text nodes are just text outside of the elements, and are added by simply writing in raw text. Always be careful that text is wrapped by opening and closing element nodes. Example: Bob • Element nodes must ALWAYS be closed. Closing can be done through a separate closing tag, or by having the element close itself. Examples: <FirstName></FirstName> OR <FirstName /> • The XML document declaration for the first line at the top is generally unchanging and must always be provided, or else the document will be invalid.
XML Namespaces • Because of the potential for element naming clashes (ie, multiple elements named FirstName), XML documents frequently use namespaces. • A namespace is a unique string of text that the creator can guarantee no one else will ever use. To ensure this guarantee, web URLs to a site that the creator owns are generally used because of their guaranteed uniqueness. • Namespaces are generally assigned at the root element, using the xmlns (XML Namespace) declaration. Example: xmlns=http://www.namespace.com/mynamespace • If a single namespace is being used, it can be declared the way shown above. Multiple namespaces will use prefixes. A prefix is a short bit of text that is placed before each element. It is treated as a stand-in for the full namespace, so each element with that prefix will be qualified by the full namespace by the parser. • Namespace prefixes don’t have to have the same uniqueness guarantee as the namespace itself, it just needs to be unique within the single document it is being used within. • Declaring a namespace with a prefix in the root element: xmlns:fish=http://www.namespace.com/mynamespace • Using a namespace with a prefix on elements within the document: <fish:FirstName></fish:FirstName>
XPath Axes • XPath locations are typically composes of axes (plural of “axis”). These are similar to file system paths. Expressions are always evaluated against a context, or “current node”, so the axes are always relative to that context. • Axes are written with a series of symbols to represent the flow of the path through the XML document. They include: • / - The “child-of” symbol. This divides the name of a parent element from that of a child element. Example: Fish/FirstName • This symbol, if used as the first character of an XPath expression with no preceding name, refers to the root of the entire document. Example: /Root/Fish/FirstName • // - The descendant-or-self axis. This means that the node that follows this will be expected to be either the current node, or a descendant of the current node, an unlimited number of levels deep. Example: //FirstName • . – A single dot is shorthand for the current node. • .. – Two dots is shorthand for the parent node. It can be chained together to move in reverse up the document. Example: ../../Root/Fish/FirstName • @ - The “at” symbol represents an attribute node, and is always followed by the attribute’s name. Example: /Root/Fish/@index
XPath Axes (Continued) XPath includes additional axes with readable names that perform both similar and more advanced functions than the symbolic axes. These axes include: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, and self.
XPath Namespaces XPath needs to be aware of any namespaces being used in the XML document in order to find nodes properly. Even if the XML document doesn’t use a namespace prefix, it is recommended to use one with XPath. Generally, XPath evaluation engines (including XSLT) provide the means to declare namespaces independent of the document being evaluated. Example of XPath with prefix: /fish:Root/fish:Fish/fish:FirstName If the namespace isn’t known, or can otherwise be ignored, XPath has a function that searches purely by the local part of the name. Example: local-name(FirstName)
XPath Functions • XPath implementations contain a set of useful functions, which can perform more advanced operations. • Function calls in XPath are similar to most programming languages: functionName(parameter, parameter, etc…) • Functions can also return a result of a specific type, depending on the function. • One common function is the substring() function, which has this signature: String substring(expression, index, [length]). • Breakdown • This function returns a value of type String • The function is declared with the name “substring” • It requires two arguments, the XPath expression for for the value to substring, and the index to start the substring at. • There is a third, optional argument, length, which determines how long the substring should be. It defaults to the end of the original String. • Using our sample document, if the context node is the first Fish element, and we want to grab the “Gold” part of the Type element, we would write the function like this: substring(Type,1,4)
XPath Functions (Continued) There are many XPath functions, and they all have drag-and-drop components in the PilotFish Data Mapper to access them. For more information on XPath functions, there are many online resources to learn more about them.
XPath Predicates XPath expressions can contain the equivalent of a “where” clause, called a predicate. Predicates can be used anywhere in an XPath expression to express a condition at points where the expression could return more than one possible result. The syntax for a predicate is: axis[expression], where “expression” is implicitly evaluated and should return a boolean (true/false) value. Expressions which are numeric represent indices, and are used when an expression returns more than one node. This example selects the second Fish element: /Root/Fish[2] Otherwise, expressions include a boolean test, usually involving attributes or child elements of the element being tested. This example selects the Fish element with the Type “Halibut”: /Root/Fish[Type = ‘Halibut’] Expressions can be joined using “and” or “or”.
XSLT Basics XSLT builds a stylesheet that contains a series of instructions that, once executed, build a standard XML document. XSLT stylesheets are XML documents, with special instruction elements that use the XSLT namespace: “http://www.w3.org/1999/XSL/Transform”. XSLT uses XPath for retrieving data from the source XML document, while it directly declares all elements in the target XML document. XPath is never used by XSLT for operations involving the target document. All XSLT operations revolve around Templates. XSLT instruction elements are processed and executed by an XSLT transformation engine. These engines take in the stylesheet and source XML document, and process them to produce the target XML document. Common XSLT transformation engines include Xalan and Saxon.
XSLT Declaration & Namespaces XSLT stylesheets are declared with a common root element and the default namespace: <xsl:stylesheetxmlns:xsl=“http://www.w3.org/1999/XSL/Transform” /> Any additional namespaces that are referenced in the document (such as the namespaces of the source and/or target XML documents) need to also be declared in the root “stylesheet” element, along with a prefix. Sometimes, namespaces will be used in XSLT stylesheets that shouldn’t appear in the output document. These commonly occur when using XSLT extensions. To exclude namespace prefixes, the following attribute needs to be added to the “stylesheet” element: exclude-result-prefixes=“prefix”
XSLT Versions XSLT, in its lifespan, has seen 3 major versions released. XSLT 3.0 is proprietary, requiring a paid license from Saxonica. PilotFish does not currently license XSLT 3.0 with its product, however XSLT 3.0 is supported and can be enabled if the client has a license. XSLT 1.0 and 2.0 are both fully supported in the PilotFish product. XSLT 2.0 tends to have significantly higher performance, as well as newer and more advanced features. However, PilotFish has a number of powerful XSLT extensions built into the Data Mapper that depend on XSLT 1.0. When specific PilotFish features are required, using XSLT 1.0 is recommended. Otherwise, using XSLT 2.0 is always recommended. There are very few legacy PilotFish features that depend on XSLT 1.0, and they are all shortcuts to make complex XSLT operations easier, so they can still be done in XSLT 2.0 with a little more work.
XSLT Templates Templates are the equivalent of XSLT’s engine. They are executed by matching on an XPath expression. Example: <xsl:template match=“/Root”></xsl:template> The XPath expressions being matched are always evaluated against the source document only. Templates can match on any valid XPath expression. They can even match on relative XPath expressions, assuming that the current context is lower than the root. Other templates will only be checked for matches when a template is not currently executing. While a stylesheet has no limit to the number of templates, there must be one “root” template. This template matches either on the root element, or another XPath expression that points to a single element high up in the source XML document. This is the template that starts the whole transformation. Templates can also be named, in which case they will be called not based on an automatic XPath match, but by calling them specifically by their name within another template.
XSLT Outputs • XSLT always outputs the target XML document, specified by the logic within the stylesheet. • Target XML elements and attributes can be hardcoded as if writing a normal XML document: <Element attribute=“value”>Value</Element> • Target XML elements and attributes can also be built dynamically using XSLT instruction elements. • Elements: <xsl:element name=“Element”>Value</xsl:element> produces: <Element>Value</Element> • If the above also wrapped: <xsl:attribute name=“attribute”>Value</xsl:attribute>, it would produce: <Element attribute=“Value”>Value</Element> • Target XML elements can be populated with values from the source XML. This is the most common type of XSLT operation. It is done with the “value-of” element. • To map FirstName from the source to the target element Value, the expression would be: <Value><xsl:value-of select=“Fish/FirstName” /></Value> • Notice how the XPath expression appears in the “select” attribute. This expression is how the source data is retrived. Select is a common XSLT attribute that appears in many places. Generally, wherever it appears, an XPath expression pointing to a source node is expected.
XSLT Conditions • Conditional logic in XSLT is done one of two ways. • A simple <xsl:if test=“expression” /> element exists to test a single condition. If the expression in the “test” attribute resolves to “true”, then the children of the IF element will be executed. Otherwise, they will be ignored. • For more complex operations, an <xsl:choose/> operation is needed. XSLT “choose” elements wrap around mutually exclusive child elements that test various conditions. Whichever condition evaluates as true first is the one that is executed. • <xsl:when test=“expression”/> is an element that operates the same as the IF element. If the expression resolves to true, the contents will run. • <xsl:otherwise /> is a catch-all, “else” type of condition. If all WHEN elements have resolved to false, this will be run at the end. • Example:
XSLT Iteration Iteration is the most common form of flow-control operation done in XSLT. It is always done by iterating over collections of elements in the source XML document. The <xsl:for-each select=“expression” /> node is how iteration is done. The expression is an XPath that returns one or more nodes from the source XML document. Whatever nodes are added as children of a “for-each” will be executed repeatedly for each iteration. Target elements described there will be written out each time, and any XSLT instruction elements will be evaluated and re-evaluated each time. The “current” context within a “for-each” is always the node that is selected by the current step of the iteration. All relative XPaths need to be based on that context.
XSLT Variables • XSLT allows values to be stored in variables. A variable can either be given a hardcoded text value or assigned a value from the source XML document using an XPath expression. • One a variable has a value assigned, it can NOT be re-assigned. • There is an exception to this rule. If the original declaration of the variable exists in a scope that repeats, such as a “for-each”, it will be re-declared and re-assigned each time the loop iterates. • XSLT variables have implicit types, generally either a String, Node, or NodeSet, depending on the expression used to assign them their value. • Certain newer versions of XSLT allow for explicitly assigning types to variables, but this is not always the case. • XSLT variable syntax: <xsl:variable name=“VariableName” select=“XPath/Expression” /> • XSLT variables can be referenced anywhere an XPath expression can be used, with a $ preceding its name to indicate it is a variable. Syntax: <xsl:value-of select=“$VariableName” />
XSLT Parameters XSLT parameters are similar to variables, however they are values provided externally from the stylesheet. Parameters must be declared at the top of the stylesheet, and are generally not assigned values within the XSLT, as those values are provided by the XSLT transformation engine. Like variables, XSLT parameters cannot have their values re-assigned. However, unlike variables, there are no exceptions to this rule. XSLT parameter declaration syntax: <xsl:param name=“ParamName” /> XSLT parameters follow the same referencing syntax as variables. Anywhere an XPath expression can be used, parameters can also be referenced, using the same $ prefix as variables. Syntax: <xsl:value-of select=“$ParamName” />
Plain Text • When assigning values to target elements, XSLT also has the ability to assign plain text. • There are two ways of doing this. • Simply write the text into the element as if composing a normal XML document: <Element>This is the value</Element> • Use the ”text” instruction element: <Element><xsl:text>This is the value</xsl:text></Element>
Escaping XMLcontrol characters such as @,&,<, and > (to name a few) must always be escaped when being used literally in any form of text content. For example, > becomes > Please consult online references for more information about character escaping in XML.
Java Callouts • XSLT supports calling out to external applications that are available to the transformation engine. In the case of PilotFish, this means the broader Java code in the application. • Any Java class on the PilotFishclasspath, including custom code provided by customers through PilotFish’s extension capabilities, can be used via Java callouts. • The XSLT syntax for Java Callouts has two requirements: a namespace declaration to tell the XSLT engine that this functionality is being used, and a declaration for the Java class being accessed. • The XSLT namespace for the engine varies based on the engine. The prefix set by the engine’s namespace must be used when declaring the Java class. • Xalan: xmlns:xalan=“http://xml.apache.org/xalan” • Saxon: xmlns:java=http://saxon.sf.net/java-type • The referenced Java class must also be declared as if it were a namespace with a prefix: xmlns:td=“java:com.pilotfish.eip.TransactionData”
Java Callouts (Continued) • Java callouts can have instances declared as variables: <xsl:variable name=“JavaObject” select=“td:new()” /> • If the constructor has arguments, they must be passed into the new() function. • Calling instance methods require a variable instance to have been declared. That variable instance must always be the first argument of an instance method, regardless of whether or not that method actually has arguments. Example: <xsl:variable name=“ObjectToString” select=“td:toString($JavaObject)” /> • Calling static methods does NOT require a variable instance, nor does it require that special first argument. Example: <xsl:variable name=“ObjectStaticMethod” select=“td:someStaticMethod()” />
Identity Transforms An Identity Transform is a template that produces an output XML document that is identical to the source XML document. However, it is possible to specify specific changes to be made as this process executes. This is very useful for when a transformation needs to produce an output that is nearly identical to the original, with only a few modifications necessary. The basic declaration of the Identity Transform is in the form of a recursive template that matches on everything, and then invokes a “copy” operation on everything: The important part is that, within the ”copy” element, there is the “apply-templates” element. This tells the XSLT engine that every single item being copied should be checked against any other templates that exist to see if there is a match. To change specific items, more templates are added that match on only the specific things that should be changed. Example: <xsl:template match=“/Root/Fish[index = 1]/FirstName”></xsl:template>. Only the items specified by other templates will be changed, everything else will be copied over.
Keys and the Muenchian Method • In some cases, an XSLT transformation will require accessing a collection of source XML elements based on some form of grouping. XSLT provides an insanely efficient tool to do this, using the “key” function. • The first part is to declare an XSLT Key. This is done with the “key” element: <xsl:key name=“fish-by-firstname” match=“Fish” use=“FirstName” /> • The name attribute is how the key will be referenced later on. • The match attribute is an XPath expression the key should match on, similar to a template. It can be relative, depending on what the current context will be when it is referenced. • The use attribute specifies another relative XPath expression, treating the expression in the match attribute as the current context. • Match and use together, in this case, means this key will match on all Fish elements, and group them by the value of their FirstName child elements. • Using a Key to perform this grouping then requires the use of the “key” XSLT function: key(‘fish-by-firstname’, /Root/Fish/FirstName) • The first argument to the function is the name of the previously declared Key to use. • The second argument is an XPath expression whose result should match the grouping defined in the Key. In this case, the Key returns a collection of Fish elements grouped by their FirstName value. • In this case, this function will return a collection of all the Fish elements who have the same FirstName value as what is being passed into the function.
Keys and the Muenchian Method (Continued) • The best use for Keys is something called the Muenchian Method. This is an XSLT pattern that uses Keys and one additional function to allow for iterating over elements grouped by Keys. • The additional function is the ”generate-id” function. This function generates a unique ID for any node passed into it. This ID is both unique and consistent, meaning the same node will always produce the same ID. • The Muenchian expression for this kind of iteration is: <xsl:for-each select=“/Root/Fish[generate-id(.) = generate-id(key(‘fish-by-firstname’, FirstName)[1])” /> • The start of the XPath expression is a simple XPath pointing at the Fish element. It then uses a predicate to specify which Fish element to show. • The predicate starts by generating an ID for the current node using ”generate-id”. It then compares that to a second ID generated by the same function with a different argument. • The second ID is generated using a Key. The “key” function is used to return the grouping that matches the current expression. The [1] specifies that it must be the first element of that grouping. • This whole expression ensures that each iteration will represent a unique grouping based on the key. • While this only iterates over groups, and not individual elements, once within a group there is a lot more flexibility to get the individual members of that group, having been ensured of the order of the grouping.