320 likes | 338 Views
This seminar lecture discusses the increasing popularity of XML and the need for modern object-oriented languages to support XML processing. It introduces the XJ program, which provides constructs for seamless translation of XML schema into Java types and offers improved XML processing capabilities.
E N D
No More Pain for XML’s GainXJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarke Itay Maman 236826 Seminar lecture, 15 June 2005
The basic premise • XML is getting increasingly popular • XML manipulation is now a common programming task • The lead question: • Do modern OO languages sufficiently support XML ?
Introduction: Schema file(file: technioncatalog.xsd) <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element name="course" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="points" type="xs:int"/> <xs:element name="number" type="xs:int"/> <xs:element name="name" type="xs:string"/> <xs:element name="teacher" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Introduction: XML document(file: short.xml) <?xml version="1.0" encoding="UTF-8"?> <catalog> <course> <points>3</points> <number>234319</number> <name>Programming Languages</name> <teacher>Ron Pinter</teacher> </course> <course> <points>3</points> <number>234141</number> <name>Combinatorics for CS</name> <teacher>Ran El-Yaniv</teacher> </course> </catalog> “Combinatorics for CS (234141) by Ran El-Yaniv, 3 credit points” Desired Output...
Introduction: The XJ program import java.io.*; import technioncatalog.*; public class Demo1 { public static void main(String[] args) throws Throwable { catalog cat = new catalog(new(File("short.xml")); catalog.course c = cat [| /course[2] |]; printCourse(c); } private static void printCourse(catalog.course c) { String name = c [| /name |]; String teacher = c [| /teacher |]; int points = c [| /points |]; int id = c [| /number |]; System.out.println(name + "(" + id + ") by " + teacher + ", " + points); } }
Traditional XML processing: (DOM, XPath apis) The types of the XML objects (Node, Document) do not reflect the schema public static void main(String[] args) throws Throwable { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new java.io.File("short.xml")); XPath xp = XPathFactory.newInstance().newXPath(); DTMNodeList nodes = (DTMNodeList) xp.evaluate("//course", doc, XPathConstants.NODESET); printCourse(nodes.item(1)); } • XPath is a plain string. It may be: • Syntactically incorrect • Incompatible with the document
Traditional XML processing(DOM apis) Assumption: 3rd child is the course number • These assumptions will not hold if the schema is changed • => run-time errors • problems remain, even if we identify nodes by name • Possible Schema changes: • Allowing a new optional <students> sub-element • Changing the order of the sub-elements Assumption: 2nd child has no child elements private static void printCourse(Node n) { NodeList nodes = n.getChildNodes(); System.out.println(nodes.item(5).getTextContent() + " (" + nodes.item(3).getTextContent() + ") by " + nodes.item(7).getTextContent() + ", " + nodes.item(1).getTextContent() + " credit points"); } Assumption: Four child nodes must exist What about reading the numeric value of an element?
No easy solution • Similar problems occur when: • XML elements are created by the program • Other libraries are used for reading/writing XML documents • Such as: Xalan, SAX • The developer wraps several complex operations within a single function/method/class • These are inherent problems of the language
Shaping the future • What XML-related facilities do we want? • Typed XML objects • Seamless translation of a Schema/DTD into a Java type • Two composition techniques • XML notation • Java’s object creation syntax • Two decomposition techniques • Typed XPath • Typed, named methods/fields • XPath expressions as first-class-values
Has the future arrived yet? • Significant effort in integration of XML into modern programming language • XJ • Scala • Cω • XTatic • … • We will overview the constructs offered by XJ • A super-set of Java • Available at: http://www.research.ibm.com/xj
XJ’s Type system • Hierarchy of classes • A common root class: XMLObject • Automatic import: package com.ibm.xj.* • Genericity: Sequence<T>, XMLCursor<T> • XMLCursor<T>is a Sequence<T> iterator
Integration with Schema • The rationale: • An OO program is a collection of class definitions • A Schema file is a collection of type definitions • => let’s integrate these definitions • Any Schema is also an XJ types • The XJ compiler generates a “logical class” for each such type • Schema file == package name • Using a schema == import schema_file_name;
XML literal in XJ code • Invalid XML content triggers a compile-time error • Resulting elements are typed! • Curly braces allow “escaping” back into XJ import technioncatalog.*; public class Demo2 { public static void main(String[] args) throws Throwable { String x = "Algorithms 1"; int y = 234247; catalog cat = buildCatalog(new catalog.course( <course><points>3</points> <number>{y}</number><name>{x}</name> <teacher>Shlomo Moran</teacher></course>)); } private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); } }
An ill-typed program ... course c = new course(<course> <teacher>Shlomo Moran</teacher></course>); buildCatalog(c); XMLObject x = new course.teacher ( <teacher>Shlomo Moran</teacher>); buildCatalog(x); ... private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); } Wrong <course> element An XMLObject cannot be passed as a course element
Embedding XPath Queries in XJ • Syntax: XmlValue[| XPathQuery |] • Requires: a context-provider: • An XML element over which the XPath query is invoked • (see the cat variable in the sample) • Escaping: use a ‘$’ prefix course doSomething(catalog cat, int courseNum) { return cat [| /course[./number = $courseNum] |]; }
XPath Semantics • Problem: resulting type is sometimes not so clear • Two options • Sequence<T> • If the compiler determines that all result elements are of type T • Sequence<XMLObject> • (Otherwise) • Automatic conversion from a singleton sequence • Static check of XPath queries • If result is always empty => compile-time error • (The compiler cannot catch all cases)
Implicit coercions • An atomic XML value can be seamlesslyconverted into a corresponding Java value • xsd:double => double • xsd:boolean => boolean • xsd:string => java.lang.String • … • This reduces the verbosity of XML-related code: import technioncatalog.*; import technioncatalog.catalog.*; public static String getTeacher(course c) { return c [| /teacher |]; } Sequence<teacher> ► teacher ► String
Updates: Assignment to Query Result public static void changePoint(catalog.course c, int p) { c [| /points |] = p; } • An XPath expression returns a reference to an existing element • (No copying is involved) • Consistent with Java’s semantics for objects • Thus, it can be assigned to • An XPath expression is a legal lvalue • Bulk assignment • Occurs when the XPath expression denotes a sequence • Bulk assignment operator := allows multiple assignments • Double the credit points of each course: cat [| //points |] *:= 2;
Tree structure update • Class XMLObject also defines methods, such as: • insertAfter() • insertBefore() • insertAsFirst() • detach() public static void addCourse(catalog cat) { course c = new course(<course><points>4</points> <number>234111</number><name>Intorduction to CS</name> <teacher>Roy Friedman</teacher></course>); cat.insertAsLast(c); } Which object is being modified?
Problems: Type Consistency • Definitions • An XML update operation, u, is a mapping over XML values • u: T1 -> T2 • An update is consistent if T1 = T2 • Ideally, a compile-time error should be triggered for each inconsistent update in the program • Unfortunately, this cannot be promised • The solution: Additional run-time check Why do we want the two types to be equal? Can you think of an example ?
Problems: Covariant subtyping (1/2) • Covariance: change of type in signature is in the same direction as that of the inheritance A1.m() is “spoiled”: Requires only X1 objects class X { } class A { public void m(X x) { } } Class X1 extends X { } Class A1 extends A { public void m(X1 x) { } } ... A a = new A1(); a.m(new X()); Which method should be invoked: A.m() or A1.m() ? • Java favors type-safety: A method with covariant arguments is considered to be an overloading rather than overriding • Same approach is taken by C++, C# • But, covariance is allowed for arrays • Array assignments may fail at run-time
Problems: Covariant subtyping (2/2) (Now let us get back to our technioncatalog schema…) • A <course> value is also spoiled • It requires unique children: <points>, <name>, etc. • But, it also has an unspoiled super-class: XMLObject • All updates to XMLObject are legal at compile-time • The following code compiles successfully: public static void trick(course c) { XMLObject x = c; points p = new points(<points>4</points>); x.appendAsLast(p); } Run-time error is here !!
Shaping the future (revisited) • Language constructs seen so far • Typed XML objects • Seamless translation of a Schema/DTD into a Java type • Two composition techniques • XML notation • Java’s object creation syntax • Two decomposition techniques • Typed XPath • Typed, named methods/fields • XPath expressions as first-class-values
XPath expression as first-class-values • What is a first-class-value? • A value that can be used “naturally” in the program • Passed as an argument • Stored in a variable/field • Returned from a method • Created • In XJ, XPath expression do not met these conditions • The main obstacle: The XPath part of the expression cannot be separated from its context provider
XPath expression as first-class-values(cont’d) • Let’s speculate on XPath as an FCV… • (Following code IS NOT a legal XJ program) private static Sequence<teacher> teachers; static Sequence<teacher> find(XPath<catalog,teacher> q) { Catalog c = new Catalog(new File("file1.xml")); return q.evaluate(c); } static void main(String[] args) { Sqeuence<teacher> all = find(<catalog>[| //teacher |]); Sequence<teacher> few = find( <catalog>[| //number/234319/../../teacher |] ); }
XPath expression as first-class-values(cont’d) • Operators on XPath values • Composition • Conjunction • Disjunction • These operators will allow the developer to easily create a rich array of safe XPath values • The compiler must keep track of the type of each such value • Basically an XPath value is a function T -> R, where both T,R are subclasses of XMLObject • When two XPath values are composed, the result type is deduced from the types of the operands
Scala: Composition of XML elements • In Scala, types can be defined in a DTD file • A DTD can be translated into Scala classes via the dtd2scala utility • Scala offers two options for composition of XML elements: • Using XML notation (similar to XJ) • Using case-class construction notation: import Data._; // import generated definitions import scala.xml._; // for creating PCDATA nodes object Main with Application { val x = course(teacher(Text("Ran El-Yaniv")), points(Text("3")), name(Text("Combinatorics for CS")), number(Text("234141"))); Console.println(x); }
Typed, named methods/fields • Usually, values aggregated by a Java object are accessed by fields/methods • Can we access XML sub-elements this way? • (Following code IS NOT a legal XJ program) import technioncatalog.*; void printTeachers(catalog cat) { for(int i = 0; i < cat.courses.length; ++i) { catalog.course c = cat.courses[i]; System.out.println(c.teacher); } }
Typed, named methods/fields(cont’d) • Some of the difficulties: • Sub-elements are not always named • Schema supports optional types: <xsd:choice> • How can Java express an “optional” field? • Observation: Java’s typing mechanisms cannot capture the wealth of Schema/DTD types • Missing features: virtual fields, inheritance without polymorphism • Other features can be found in Functional languages • E.g.: Variant types, immutability, structural conformance • But, their popularity lags behind
Summary • XJ is a Java extension that has built in support for XML • Type safety: Many things are checked at compile time • Ease of use • OO languages are not powerful enough (in terms of typing) • Some type information is lost in the transition Schema -> Java