310 likes | 446 Views
XJ: Facilitating XML Processing in Java. Presented By: Tamar Aizikowitz Winter 2006/2007. Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar. 14th World Wide Web Conference (WWW2005), Chiba, Japan. first. John. person. last.
E N D
XJ: Facilitating XML Processing in Java Presented By: Tamar Aizikowitz Winter 2006/2007 Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar 14th World Wide Web Conference (WWW2005), Chiba, Japan
first John person last Lennon XML • Syntax:<person> <first>John</first> <last>Lennon</last></person> • Semantics: • Applications: The future web? XHTML? RSS? • Problem: Supposedly human readable and writable, but not really… • Markup language • Tags define elements • Elements contain other elements • Elements contain data
XML Schema • XML based alternative to DTDs. • Describes structure of XML document. • Programmer defines valid structure of data by defining element types. • Support for standard and user defined types. <xs:element name=“person” type=“personInfo”><xs:complexType name=“personInfo”> <xs:sequence> <xs:element name=“first” type=“xs:string”/> <xs:element name=“last” type=“xs:string”/> </xs:sequence></xs:complexType>
XPath Query XML Tree XML Node Sequence XPath Query Processor XPath • Query language for selecting a sequence of nodes from an XML document. • Filtering of result nodes using predicates. • Example://person[last=“Lennon”]/first
XJ Introduction • Developed at the IBM Watson Research Center. • More information: http://www.research.ibm.com/xj/. Java 1.0 XJ xjc compiler xj runtime environment Java 1.1 Java 1.4 Java 1.5
XJ Holy Grail:Smooth Java/XML integration • XML Trees • Just like 3, “Hello” and other values. • XML Schema • Just like Java classes. • XPath Queries • Just like [], ?: and other Java operators. • Smart Compiler • Optimization…. Improved efficiency.
Example: Music Library musicLibrary album album album title stars artist artist string [1-5] string string
Music Library Schema <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="musicLibrary"> <xs:complexType> <xs:sequence> <xs:element name="album" maxOccurs="unbounded"> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Music Library Schema - Album <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="stars“/> <xs:simpleType> <xs:restriction base ="xs:integer"/> <xs:pattern value =“[1-5]"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="artist" type="xs:string" maxOccurs="unbounded"> </xs:sequence> </xs:complexType>
Music Library Data <?xml version="1.0" encoding="UTF-8"?> <musicLibrary> <album> <title>Abbey Road</title> <stars>4</stars> <artist>The Beatles</artist> </album> <album> <title>Sounds of Silence</title> <stars>4</stars> <artist>Paul Simon</artist> </album>
The XJ Type Hierarchy java.lang.Object com.ibm.xj.XMLObject com.ibm.xj.Sequence com.ibm.xj.XMLCursor com.ibm.xj.io.XMLOutputStream com.ibm.xj.XMLElement com.ibm.xj.XMLAtomic All Element Classes All Atomic Classes com.ibm.xj.io.XMLDocumentOutputStream
The XMLObject Class and Subclasses • XMLObject corresponds to an XML node. • Schema import creates subclasses of XMLElement and XMLAtomic for every element declaration. • XPath expressions evaluated on instances of these classes. com.ibm.xj.XMLObject com.ibm.xj.XMLElement com.ibm.xj.XMLAtomic All Element Classes All Atomic Classes
XMLSequence and XMLCursor • Instance of Sequence is ordered list of XMLObject. • XPath expression result is instance of Sequence. • XMLCursor implements java.utils.Iterator. Used to iterate over instances of Sequence. • Support limited genericity (as defined in Java 5.0) for type checking. java.lang.Object com.ibm.xj.Sequence com.ibm.xj.XMLCursor
Importing Schema Definitions • The integration of XML Schema in XJ is built on the following correspondence: • XML Schema ~ Java Package • XML Element ~ Logical Class • Nested (local) Element ~ Nested Class • Atomic types ~ Class + Auto Unboxing
Schema ~ Package • Element declarations are integrated into the Java type system as “logical classes”. • XML documents are well typed XML values that are instances of these classes. • Syntax:import musicLibrary.*;
XML Element ~ Class • Elements represented as subclasses of XMLObject. • May be used wherever a class type is expected. • Constructed with the new() operator. • Nested elements represented as nested classes. • Syntax:musicLibrary ml = new musicLibrary(...);musicLibrary.album a = new musicLibrary.album(...);
Atomic Types • Support for XML Schema built-in atomic types such as xsd:integer and xsd:string. • Represented as subclasses of XMLAtomic. • Syntax:xsd.integer • Subtyping:xsd.short s = ...;xsd.integer i = s; • Automatic unboxing:xsd.string xstr = ...;string s = xstr;
Creating XML Objects • Mechanisms for constructing XML: • External source • Literal XML embedded in an XJ program • XMLElement constructors: • XMLElement(java.io.InputStream) • XMLElement(java.io.File) • XMLElement(java.net.URL) • XMLElement(literal XML)
Inline Construction of XML • XML data construction using literal XML. • Any well formed XML block can be used. • Example: title a = new title(<title>Greatest Hits</title>); • { and } used to insert runtime values: title buildTitle(string t) { title newT = new title(<title>{t}</title>); return newT;}
XML Type Validation Literal XML XML Parser Example: album a = new album(<album> <title>Let It Be</title> <stars>4</stars> <band>The Beatles</band> </album>); • To construct untyped XML, use the literal XML constructor for XMLElement. XML? No Yes Compilation Error Schema Validator Valid XML? Yes No Typed XML Object
Executing XPath Queries • Syntax:context [|query|] • query= valid XPath 1.0 expression. • context= XML element. Specifies context for query evaluation. • XPath expressions evaluate to Sequence<T> • Example: string band = “The Beatles”; musicLibrary m = new musicLibrary(...); Sequence<album> b = m[|/album[artist[1]=$band]|]; $refers to variables
XPath Static Semantics • XPath expressions evaluate to Sequence<T>. • T is the most specific subtype of XMLObject that the compiler can determine. • Worst case: Sequence<XMLObject> is returned. • If query result is always empty, a static error is generated. • Identified using Schema definition. • Example: title t = ...; Sequence<album> a = t[|/album|]; title has no album children
XPath Runtime Semantics • Evaluated with respect to context specifier value. • If the context specifier is a Sequence, each member is used as a context node in turn. • Value is union of results. musicLibrary m = new musicLibrary(...); Sequence<album> albums = m[|/album|]; Sequence<artist> artists = albums[|/artist|]; • If the result is not a node set, a sequence of appropriate type is returned. • For example: Sequence<xsd.boolean>.
Updating XML Data • Reference semantics • Although more difficult to implement… • Result: in-place updates, as opposed to copy based ones. • Two types of updates are supported: • Value assignments including complex types • Tree structure updates
Value Assignments • XPath expressions used as lvalues for assignment: album a = new album(...); a[|/title|] = “New Title”; • Bulk assignments: musicLibrary m = new musicLibrary(...); m[|/album[artist[1]=“The Beatles”]/stars|] = 5; • Bulk assignment advantages: • Possible optimizations efficient updates • Clear concise code.
Tree Structure Update • Methods for structural changes: • insertAfter() • insertBefore() • insertAsFirst() • insertAsLast() • Example: album currArtist = m[|/album[title=“Sounds of Silence”]/artist[1]|]; artist newArtist = new artist(<artist>Art Garfunkel</artist>); currArtist.insertAfter(newArtist);
Update Issues – Tree Structure • Duplicate parents and acyclicity • After performing tree structure updates, resulting graph must remain a tree. • Example: attaching an element that already has a parent. • Problematic XJ update will result in a runtime exception. • Can be avoided by always detaching before attaching nodes.
Update Issues – Complex Types • Need to validate that new value is still well typed after update. • Problem: Cannot always be done statically. • Example: • Schema states that element a can contain between 2 and 5 instances of element b. • What happens after attach() or detach()? • Solution: • Runtime check inserted at compile time.
Update Issues – Covariant Subtyping • XML Schema allows declaration of subtypes by restriction. • Causes problems when updating subtype values through base class interface. • Example: xsd.integer i; stars s = m[|//stars[1]|]; i = s; i = 10; • Covariant subtyping already exists in Java arrays. • The problem would arise in any language attempting to support updates on XML Schema types. illegal value for stars element
Summary – XJ Benefits • XML objects as typed values • XML Schema integration • Static type checking • Typed XPath • Compiler optimizations
XJ - The Future? • Full support for Schema types • XPath expressions as independent values • Not tied to context specifier • Operators on XPath values • Composition, conjunction, disjunction… • Typed methods and fields • musicLibrary m = new musicLibrary(…);m.album[2].title = “New Title”;