290 likes | 432 Views
XJ: Facilitating XML Processing in Java. Written By : Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarke
E N D
XJ: Facilitating XML Processing in Java Written By : Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarke Conference:The 14th International World Wide Web Conference (WWW2005), Chiba, Japan, May 10-14, 2005 • Karawan Shahla Seminar Lecture 236803
Agenda • Some files. • Main Idea. • Introduction to XJ. • XJ Type System. • XJ Expressions . • XJ Updates. • XJ Problems. • Conclusion
Schema file(file: technioncatalog.xsd) <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element name="course" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="points" type="xs:int"/> <xs:element name="number" type="xs:int"/> <xs:element name="name" type="xs:string"/> <xs:element name="teacher" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
XML document(file: short.xml) <?xml version="1.0" encoding="UTF-8"?> <catalog> <course> <points>3</points> <number>234319</number> <name>Programming Languages</name> <teacher>Ron Pinter</teacher> </course> <course> <points>3</points> <number>234141</number> <name>Combinatorics for CS</name> <teacher>Ran El-Yaniv</teacher> </course> </catalog>
XJ Program file import java.io.*;import technioncatalog.*; public class Demo1 { public static void main(String[] args) throws Throwable { catalog cat =new catalog(new(File("short.xml")); catalog.course c =cat [| /course[2] |]; printCourse(c); } private static void printCourse(catalog.course c) { String name = c [| /name |]; String teacher = c [| /teacher |]; int points = c [| /points |]; int id = c [| /number |]; System.out.println(name + "(" + id + ") by " + teacher + ", " + points); } } “Combinatorics for CS (234141) by Ran El-Yaniv, 3 credit points”
Main Idea • XML is getting increasingly popular. • High level languages should support manipulating XML sufficiently. • Let’s go through existing API’s
Traditional XML processing:(DOM, XPath apis) public static void main(String[] args) throws Throwable { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new java.io.File("short.xml")); XPath xp = XPathFactory.newInstance().newXPath(); DTMNodeList nodes = (DTMNodeList) xp.evaluate("//course", doc, XPathConstants.NODESET); printCourse(nodes.item(1)); } The types of the XML objects (Node, Document) do not reflect the schema • XPath is a plain string. It may be: • Syntactically incorrect • Incompatible with the document
Traditional XML processing(DOM apis) Assumption: 3rd child is the course number Assumption: 2nd child has no child elements • These assumptions will not hold if the schema is changed • => run-time errors • problems remain, even if we identify nodes by name • Possible Schema changes: • Allowing a new optional <students> sub-element • Changing the order of the sub-elements private static void printCourse(Node n) { NodeList nodes = n.getChildNodes(); System.out.println(nodes.item(5).getTextContent() + " (" + nodes.item(3).getTextContent() + ") by " + nodes.item(7).getTextContent() + ", " + nodes.item(1).getTextContent() + " credit points"); } Assumption: Four child nodes must exist What about reading the numeric value of an element?
Shaping the future • What XML-related facilities do we want? • Typed XML objects • Seamless translation of a Schema/DTD into a Java type • Two composition techniques • XML notation • Java’s object creation syntax • Two decomposition techniques • Typed XPath • Typed, named methods/fields • XPath expressions as first-class-values
XJ: offered solution • Java XJ. • we will over view the constructs offered by XJ. • Available at:http://www.research.ibm.com/xj
Integration with Schema • The rationale: • An OO program is a collection of class definitions • A Schema file is a collection of type definitions • => let’s integrate these definitions • Any Schema is also an XJ types • The XJ compiler generates a “logical class” for each such type • Schema file == package name • Using a schema == import schema_file_name;
XML literal in XJ code • Invalid XML content triggers a compile-time error • Resulting elements are typed! import technioncatalog.*; public class Demo2 { public static void main(String[] args) throws Throwable { String x = "Algorithms 1"; int y = 234247; catalog cat = buildCatalog(new catalog.course( <course><points>3</points> <number>{y}</number><name>{x}</name> <teacher>Shlomo Moran</teacher></course>)); } private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); } }
An ill-typed program ... course c = new course(<course> <teacher>Shlomo Moran</teacher></course>); buildCatalog(c); XMLObject x = new course.teacher ( <teacher>Shlomo Moran</teacher>); buildCatalog(x); ... private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); } Wrong <course> element An XMLObject cannot be passed as a course element
Embedding XPath Queries in XJ • Syntax: XmlExpr[| XPathQuery |] Requires: a context-provider: • An XML element over which the XPath query is invoked • (see the cat variable in the sample) course doSomething(catalog cat, int courseNum) { return cat [| /course[./number = $courseNum] |]; }
XPath Semantics • Problem: resulting type is sometimes not so clear • Two options • Sequence<T> • If the compiler determines that all result elements are of type T • Sequence<XMLObject> • (Otherwise) • Automatic conversion from a singleton sequence • Static check of XPath queries • If result is always empty => compile-time error
XJ Updates (Introduction) • XJ provide three kinds of updates: 1) Simple assignment. 2) Bulk assignment. 3) Structural updates. • XJ updates are chosen to be consistent with Java’s reference semantics.
XJ Updates (syntax and semantics) • Simple Assignment The XPath expression returns a reference to the existing element to be updated. • Bulk Assignment The XPath expression denotes a sequence , bulk assignment allows multiple assignments. Here double the credit points of each course. public static void changePoint(catalog.course c, int p) { c [| /points |] = p; } public static void changePoint(catalog.course c, int p) { cat [| //points |] *:= 2; }
XJ Updates (syntax and semantics) public static void addCourse(catalog cat) { course c = new course(<course><points>4</points> <number>234111</number><name>Intorduction to CS</name> <teacher>Roy Friedman</teacher></course>); cat.insertAsLast(c); } • Structural updates • Class XML Object also defines methods, such as: • insertAfter() • insertBefore() • insertAsFirst() • insertAsLast() • detach()
XJ Updates Problems : Cycles • Updates may cause cycles, e.g. a class that have more than one parent. • This arises a run time exception. • Ensuring that the root is never inserted into one of it’s descendants. Why cycles are bad ? Can you think of a solution ?
XJ Updates Problems : Type Consistency • Definitions • An XML update operation, u, is a mapping over XML values • u: T1 -> T2 • An update is consistent if T1 = T2 • Ideally, a compile-time error should be triggered for each inconsistent update in the program • Unfortunately, this cannot be promised • The solution: Additional run-time check Can you think of an example ?
XJ Updates Problems:Covariant subtyping(the problem) A1.m() is “spoiled”: Requires only X1 objects • Covariance: change of type in signature is in the same direction as that of the inheritance class X { } class A { public void m(X x) { } } Class X1 extends X { } Class A1 extends A { public void m(X1 x) { } } ... A a = new A1(); a.m(new X()); Which method should be invoked: A.m() or A1.m() ? • Java favors type-safety: A method with covariant arguments is considered to be an overloading rather than overriding • Same approach is taken by C++, C# • But, covariance is allowed for arrays • Array assignments may fail at run-time
XJ Updates Problems:Covariant subtyping (example) (Now let us get back to our technioncatalog schema…) • A <course> value is also spoiled • It requires unique children: <points>, <name>, etc. • But, it also has an unspoiled super-class: XMLObject • All updates to XMLObject are legal at compile-time • The following code compiles successfully: public static void trick(course c) { XMLObject x = c; points p = new points(<points>4</points>); x.appendAsLast(p); } Run-time error is here !!
Shaping the future (revisited) • Language constructs seen so far • Typed XML objects • Seamless translation of a Schema/DTD into a Java type • Two composition techniques • XML notation • Java’s object creation syntax • Two decomposition techniques • Typed XPath • Typed, named methods/fields • XPath expressions as first-class-values
XPath expression as first-class-values • What is a first-class-value? • A value that can be used “naturally” in the program • Passed as an argument • Stored in a variable/field • Returned from a method • Created • In XJ, XPath expression do not met these conditions • The main obstacle: The XPath part of the expression cannot be separated from its context provider
XPath expression as first-class-values • Operators on XPath values • Composition • Conjunction • Disjunction • These operators will allow the developer to easily create a rich array of safe XPath values • The compiler must keep track of the type of each such value • Basically an XPath value is a function T -> R, where both T,R are subclasses of XMLObject • When two XPath values are composed, the result type is deduced from the types of the operands
Typed, named methods/fields • Usually, values aggregated by a Java object are accessed by fields/methods • Can we access XML sub-elements this way? • (Following code IS NOT a legal XJ program) import technioncatalog.*; void printTeachers(catalog cat) { for(int i = 0; i < cat.courses.length; ++i) { catalog.course c = cat.courses[i]; System.out.println(c.teacher); } }
Typed, named methods/fields • Some of the difficulties: • Sub-elements are not always named • Schema supports optional types: <xsd:choice> • How can Java express an “optional” field? • Observation: Java’s typing mechanisms cannot capture the wealth of Schema/DTD types • Missing features: virtual fields, inheritance without polymorphism • Other features can be found in Functional languages • E.g.: Variant types, immutability, structural conformance • But, their popularity lags behind
Conclusion • XJ is a Java extension that has built in support for XML • Type safety: Many things are checked at compile time • Ease of use • OO languages are not powerful enough (in terms of typing) • Some type information is lost in the transition Schema -> Java