1 / 47

.opennet Technologies XML Document Object Model and XML-Java Interfaces

.opennet Technologies XML Document Object Model and XML-Java Interfaces. Fall Semester 2001 MW 5:00 pm - 6:20 pm CENTRAL (not Indiana) Time Geoffrey Fox and Bryan Carpenter PTLIU Laboratory for Community Grids. Computer Science, Informatics, Physics Indiana University Bloomington IN 47404

claudines
Download Presentation

.opennet Technologies XML Document Object Model and XML-Java Interfaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. .opennet TechnologiesXML Document Object Modeland XML-Java Interfaces Fall Semester 2001 MW 5:00 pm - 6:20 pm CENTRAL (not Indiana) Time Geoffrey Fox and Bryan Carpenter PTLIU Laboratory for Community Grids Computer Science, Informatics, Physics Indiana University Bloomington IN 47404 gcf@indiana.edu xmldomfall01

  2. The two XML World Views Database Persistent Managed Store (Virtual) XML Layer Object Layer Enterprise Java Javabeans Virtual Machine JAVA Web Server Control Servlet XML Web page Form Input/Output Processing System User • There are the Data Object • And the Document Object view – both defined in XML Schema xmldomfall01

  3. Java XML Interfaces -- SAX User Code XML SAXParser XML DataSchema/Instance(s) • The appropriate way to interface Java to XML is still being debated and there are several different approaches • There is the SAX (Simple API for XML) where a SAX parser reads an XML data stream and hands nuggets of information to a user program. These nuggets are called events and typical events are Start Tag, End Tag, Content within a Tag etc. • http://www-105.ibm.com/developerworks/education.nsf/dw/xml-onlinecourse-bynewest?OpenDocument&Count=500has a recent SAX tutorial • http://www6.software.ibm.com/developerworks/education/x-usax/index.html • SAX resource is http://www.megginson.com/SAX/index.html xmldomfall01

  4. Java XML Interfaces – Document Object Model DOM • The DOM is an unfortunate name as it is useful whether or not XML defines a document or any other object • i.e. it can be used whether one is supporting XML Web page or XML Data • Further the OM part of name is confusing as XML defines an object and the DOM describes a different object • DOM is really TOSXO – Tree Object Structure of an XML Object – or perhaps TOM – Tree Object Model • In fact if you look at almost any structured information, you will find that it has a tree structure and of course we saw that any Schema or DTD defined XML produces a tree xmldomfall01

  5. Example of HTML DOM • Here is an example of a fragment of HTML and how it can be thought of as a tree • This is called a “document fragment” in DOM(lightweight tree) xmldomfall01

  6. IMS and DOM • As an example, consider recent definition of a object structure for course material from the so called ADL or Advanced Distributed Learning effort from DoD – see http://www.adlnet.org • Here we have a hierarchy with an element called block to define nodes in a tree • This block tree node has various other elements which are specific to this application • Actually in this specification the leaves of tree are <au> tag (assignable unit) which is in fact typically a Web page • So ADL has superimposed a tree for document organization on top of tree for document given by DOM • Of course DOM applies to either tree and describes the way of navigates through it xmldomfall01

  7. Example Tree based Course Structure xmldomfall01

  8. XML DTD Structure for Block Element xmldomfall01

  9. Tree or Structured Data • Yahoo and Google offer Structured (tree) or unstructured data access Tree Nodes xmldomfall01

  10. Unstructured Data • The Gallimaufrey of Web Search Engines xmldomfall01

  11. Java XML Interfaces – DOM Tree RepresentationOf XML Instance User Code XML DOMParser XML DataSchema/Instance(s) • Apache has two so called DOM parsers which read the full tree into memory and allow you to browse it • Xerces and Crimson • Note these are built on top of SAX parsers and provide an additional layer of capability. • In all these architectures, one can choose to validate or not to validate XML xmldomfall01

  12. Java XML Interfaces -- XPP • A “Pull” Parser written by Aleksander Slominski who is a graduate student of Dennis Gannon at Indiana University • http://www.extreme.indiana.edu/soap/xpp/ • This has a similar interface to SAX but you can “backtrack” • For instance you could decide that you did not want to read all the events in a particular element • <xmlnode> Other Nodes </xmlnode> • And later go back if it turns out you need them • In DOM view of Java Interface, XPP Supports choosing whether or not to expand nodes of the XML Tree • XPP was fastest parser in a recent survey (which excluded SAX as it doesn’t preserve tree structure) • http://www-106.ibm.com/developerworks/xml/library/x-injava/index.html xmldomfall01

  13. Performance of XML DOM like Parsers • This took a variety of documents and summed time • Current XPP does not support one of documents with entities and other not so useful XML constructs Smaller NumbersBetter Article has links to all systems xmldomfall01

  14. Java XML Interfaces – JDOM I • DOM has perhaps two difficulties • A lot of DOM features are aimed at Web Page not XML data application (Tree structure common to both) • It is not especially well designed to exploit Java • JDOM is designed to produce a natural Java—XML interface • It exploits Java Collections to organize nodes and other features of an XML Instance • For more information on JDOM, visit http://www.jdom.org. • For information on the Java Community Process (JCP) standards effort for JDOM, see http://java.sun.com/aboutJava/communityprocess/jsr/jsr_102_jdom.html. • JDOM appears immature and description in performance review is not so positive! • Surprisingly it is no faster than Java DOM xmldomfall01

  15. Party Line on JDOM, DOM4J • The standard DOM is a very simple data structure that intermixes text nodes, element nodes, processing instruction nodes, CDATA nodes, entity references, and several other kinds of nodes. • That makes it difficult to work with in practice, because you are always sifting through collections of nodes, discarding the ones you don't need into order to process the ones you are interested in. • JDOM, on the other hand, creates a tree of objects from an XML structure. • The resulting tree is much easier to use, and it can be created from an XML structure without a compilation step. • Although it is not on the JCP standards track, DOM4J is an open-source, object-oriented alternative to DOM that is in many ways ahead of JDOM in terms of implemented features. • As such, it represents an excellent alternative for Java developers who need to manipulate XML-based data. For more information on DOM4J, see http://www.dom4j.org. xmldomfall01

  16. Java XML Interfaces – Castor I • http://castor.exolab.org/ is open source project that supports a different model where you map one to one XML Schema objects to Java Classes • Map Class <--> Schema • Map Java Instance <--> XML Instance • This uses Java object references to traverse tree – not explicit tree structure • Looks best if Schema reflects an integrated object and names of properties mean something • If Schema (as in ADL) just a “tree” then maybe not so natural • Next Page is Castor advertisment! • There is some partial standards effort for this type called JAXB (Java Architecture for XML Binding http://jcp.org/jsr/detail/031.jsp) • See http://java.sun.com/xml/jaxp-1.1/docs/tutorial/overview/3_apis.html for Sun’s attempt to deconfuse these approaches xmldomfall01

  17. Java XML Interfaces – Castor II • Castor XML: Java object model to and from XML • Generate source code from an XML Schema • Castor JDO: Java object persistence to RDBMS • Castor DAX: Java object persistence to LDAP • Castor DSML: LDAP directory exchange through XML • XML-based mapping file specify the mapping between one model and another • Support for schema-less Java to XML binding • In memory caching and write-at-commit reduces JDBC operations • Two phase commit transactions, object rollback and deadlock detection • OQL query mapping to SQL queries • EJB container managed persistence provider for OpenEJB xmldomfall01

  18. Java XML Interfaces – Castor III • Note Comparison of DOM versus Castor/JAXB • Maybe we have a tree corresponding to a parent class docroot and child properties called say fred. • Let fred have children of same name • The Castor way of accessing information would be reference • Docroot.fred.fred.fred.finalproperty • Actually use methods (setter/getter) as properties are private • DOM model would reference tree 4 levels down with node whose name was finalproperty • Castor has a document handler which will return the XML associated with any Java object generated from XML in text format as well as SAX DocumentHandlers and DOM trees. • Best is to combine Castor and DOM models? xmldomfall01

  19. Java XML Interfaces – Castor IV Instance Docroot parent Node Instance fred child Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Property finalpropery • This diagram illustrates the Castor versus DOM model See online book chapter (Professional XML 2nd Ed.Wrox Pubs.) http://www.wrox.com/books/samplechapters/5059/content.pdf xmldomfall01

  20. Java XML Interfaces – JAXP • JavaTM APIs for XML Processing (JAXP) is a collection of technologies allowing you to interface with many different types of XML Java interfaces • http://java.sun.com/xml/jaxp.html • This link has several good online tutorials • http://java.sun.com/xml/jaxp-1.1/docs/tutorial/overview/3_apis.html • This tutorial discusses JAXP and relation to SAX DOM XSLT JDOM • JAXP is an approved Java standard which is meant to allow you to keep the same interface and change implementation • Not clear this is efficient and will catch on xmldomfall01

  21. The Origins of the W3C DOM • The idea of DOM came from need to be able to build interactive web pages and to identify parts of a document uniquely so that one can for example • Associate a mouse event with a particular page element. • Associate input of text into a form with a particular text are • Dynamical HTML was introduced in Netscape 4 and IE5 and allows one to both associate events with HTML elements and to change the HTML structure • e.g. move a “layer” around within browser • Change text and color in a “document fragment” • Netscape’s implementation of Dynamical HTML had many bugs and was inferior to Microsoft’s although it had the essential needed functionality xmldomfall01

  22. The 4 levels of DOM • Level 0: Functionality equivalent to that evident in Netscape Navigator 3.0 and Microsoft Internet Explorer 3.0. • Levels 1 and 2 include what is called Dynamical HTML but make this much more complete • Level 1: This concentrates on the general API to an XML document. • It contains functionality for document (tree) navigation and manipulation. • It defines the special case of DOM applied to HTML with specific API’s for the different HTML elements • Level 2: includes a style sheet object model, and defines functionality for manipulating the style information attached to a document. It also enables traversals on the document (i.e. for manipulating collections of nodes) , defines an event model (very important!) and provides support for XML namespaces. • Level 3: Still being developed – see next page xmldomfall01

  23. Level 3 DOM • Level 3, which is at Working Draft stage, includes the following items: • Extending the DOM Level 2 Object Model: Allowing users and applications to access keyboard events. Adding the ability of defining groups of events. • Content Models (DTD, Schema) and Validation: an object model for accessing and modifying a Content Model for a document. • Load and Save interfaces: for loading XML source documents into a DOM representation and for saving a DOM representation as an XML document. • Embedded Document Object Model: Currently, the Web is moving towards documents with mixed markup vocabularies, e.g. SVG fragments can be embedded in an XHTML document. This creates new challenges for the DOM, since it also means that DOM APIs and implementations of the different vocabularies need to work together. • Adaption to changes to core XML functionality: the DOM is an API to an XML document. As auxiliary functionality to XML 1.0 is developed (namespaces, XML Base), the DOM API should model this. • XPath DOM: A simple solution to query a DOM tree using XPath will be also included. xmldomfall01

  24. What the DOM is not ….. I • Although the Document Object Model was strongly influenced by "Dynamic HTML", in Level 1, it does not implement all of "Dynamic HTML". In particular, events have not yet been defined. Level 1 is designed to lay a firm foundation for this kind of functionality by providing a robust, flexible model of the document itself. • The Document Object Model is not a binary specification. DOM programs written in the same language will be source code compatible across platforms, but the DOM does not define any form of binary interoperability. • The Document Object Model is not a way of persisting objects to XML or HTML. Instead of specifying how objects may be represented in XML, the DOM specifies how XML and HTML documents are represented as objects, so that they may be used in object oriented programs. • The Document Object Model is not a set of data structures, it is an object model that specifies interfaces. Although this document contains diagrams showing parent/child relationships, these are logical relationships defined by the programming interfaces, not representations of any particular internal data structures. xmldomfall01

  25. What the DOM is not ….. II • The Document Object Model does not define "the true inner semantics" of XML or HTML. The semantics of those languages are defined by W3C Recommendations for these languages. The DOM is a programming model designed to respect these semantics. The DOM does not have any ramifications for the way you write XML and HTML documents; any document that can be written in these languages can be represented in the DOM. • The Document Object Model, despite its name, is not a competitor to the Component Object Model (COM). COM, like CORBA, is a language independent way to specify interfaces and objects; the DOM is a set of interfaces and objects designed for managing HTML and XML documents. The DOM may be implemented using language-independent systems like COM or CORBA; it may also be implemented using language-specific bindings like the Java or ECMAScript bindings specified in this document. xmldomfall01

  26. Language Bindings • The DOM specifies a set of methods and properties which are the interface that for user to access the static or dynamic (events) of an XML structure. It also allows one to create or modify such structures • In specification it gives this interface for IDL (CORBA), Java and ECMAScript • For Web Pages, Java (in Java Server Pages) or ECMAScript are most important • ECMAScript is a general object based scripting language • ECMAScript plus the DOM bindings is essentially JavaScript • Of course Netscape 4 and IE5 do not follow (exactly) the W3C DOM • Mozilla (Netscape 6) http://www.mozilla.org/js/ does support the W3C DOM Interface – fully at level 1 and partially at level 2 xmldomfall01

  27. Netscape 6 and Level 1 DOM • Note that Netscape 6 supports XML • This comes from http://home.netscape.com/browsers/future/standards.html • In Netscape 6 and Mozilla “everything” (Web page and Browser adornments) are controlled by DOM interface xmldomfall01

  28. DOM Level 1Core Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node • In the DOM, one builds a tree out of a set of Node objects • Each Node object has a set of generic capabilities (properties and methods) and also implements specific interfaces. In the CORE one defines a set of Node types to reflect the structure of XML. Each Node type has its own interface to reflects its special features. …….. etc. xmldomfall01

  29. Node Types in Level 1 Core I • For each Node Type, we give the allowed children • Document -- Element (maximum of one), ProcessingInstruction, Comment, DocumentType • DocumentFragment-- Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference • DocumentType -- no children • EntityReference -- Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference • Element -- Element, Text, Comment, ProcessingInstruction, CDATASection, EntityReference xmldomfall01

  30. Node Types in Level 1 Core II • Attr -- Text, EntityReference • ProcessingInstruction -- no children • Comment -- no children • Text -- no children • CDATASection -- no children • Entity-- Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference • Notation -- no children xmldomfall01

  31. The Node Interface in CORBA IDL Constants Properties Methods xmldomfall01

  32. nodeName nodeValue attributes • Each Node type has particular rules for values of some of the properties – most importantly nodeName and nodeValue • attributes is property only allowed for an element document type Node Type xmldomfall01

  33. Document Fragment Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node • This is a lightweight “document” used to denote a part of a Tree. As it does not carry all the overhead of an XML object instance, it is a convenient way of denoting a sub tree including all leaf nodes below a certain internal node. • This is an important building block for documents Node Document Fragment Node xmldomfall01

  34. This page is full of Documents Fragments such as or xmldomfall01

  35. Properties of a Node I • nodeName • The name of this node, depending on its type; see the table above. • nodeValue • The value of this node, depending on its type; see the table above. • Exceptions on setting: DOMException • NO_MODIFICATION_ALLOWED_ERR: Raised when the node is readonly. • Exceptions on retrieval: DOMException • DOMSTRING_SIZE_ERR: Raised when it would return more characters than fit in a DOMString variable on the implementation platform. • nodeType • A code representing the type of the underlying object, as defined above. xmldomfall01

  36. Properties of a Node II • parentNode • The parent of this node. All nodes, except Document, DocumentFragment, and Attr may have a parent. However, if a node has just been created and not yet added to the tree, or if it has been removed from the tree, this is null. • childNodes • A NodeList that contains all children of this node. If there are no children, this is a NodeList containing no nodes. The content of the returned NodeList is "live" in the sense that, for instance, changes to the children of the node object that it was created from are immediately reflected in the nodes returned by the NodeList accessors; it is not a static snapshot of the content of the node. This is true for every NodeList, including the ones returned by the getElementsByTagName method. • firstChild • The first child of this node. If there is no such node, this returns null. • lastChild • The last child of this node. If there is no such node, this returns null. xmldomfall01

  37. Properties of a Node III • previousSibling • The node immediately preceding this node. If there is no such node, this returns null. • nextSibling • The node immediately following this node. If there is no such node, this returns null. • attributes • A NamedNodeMap containing the attributes of this node (if it is an Element) or null otherwise. • ownerDocument • The Document object associated with this node. This is also the Document object used to create new nodes. When this node is a Document this is null. xmldomfall01

  38. Methods of a Node I • insertBefore (newChild, refChild) • Inserts the node newChild before the existing child node refChild. If refChild is null, insert newChild at the end of the list of children. • If newChild is a DocumentFragment object, all of its children are inserted, in the same order, before refChild. If the newChild is already in the tree, it is first removed. • replaceChild (newChild, oldChild) • Replaces the child node oldChild with newChild in the list of children, and returns the oldChild node. If the newChild is already in the tree, it is first removed. xmldomfall01

  39. Methods of a Node II • removeChild (oldChild) • Removes the child node indicated by oldChild from the list of children, and returns it. • appendChild (newChild) • Adds the node newChild to the end of the list of children of this node. If the newChild is already in the tree, it is first removed. • hasChildNodes • This is a convenience method to allow easy determination of whether a node has any children. • It returns true if there are any Child Nodes xmldomfall01

  40. Methods of a Node III • cloneNode (deep) • Returns a duplicate of this node, i.e., serves as a generic copy constructor for nodes. The duplicate node has no parent (parentNode returns null.). • Cloning an Element copies all attributes and their values, including those generated by the XML processor to represent defaulted attributes, but this method does not copy any text it contains unless it is a deep clone, since the text is contained in a child Text node. Cloning any other type of node simply returns a copy of this node. • Parameter deep: If true, recursively clone the subtree under the specified node; if false, clone only the node itself (and its attributes, if it is an Element). xmldomfall01

  41. Two Specific Interfaces • DocumentFragment: • And Document xmldomfall01

  42. HTML Level 1 DOM • This has several extensions basically inheriting the XML Interfaces of Core to specialize to each HTML tag • An HTMLDocument interface, derived from the core Document interface. HTMLDocument specifies the operations and queries that can be made on a HTML document. • An HTMLElement interface, derived from the core Element interface. HTMLElement specifies the operations and queries that can be made on any HTML element. Methods on HTMLElement include those that allow for the retrieval and modification of attributes that apply to all HTML elements. • Specializations for all HTML elements that have attributes that extend beyond those specified in the HTMLElement interface. For all such attributes, the derived interface for the element contains explicit methods for setting and getting the values. xmldomfall01

  43. HTMLDocument Interface • This uses another special interface data structure HTMLCollection to hold lists of sub-components xmldomfall01

  44. HTMLElement and Specializations • Any HTML Element adds to Node The <body> tag adds xmldomfall01

  45. Two HTML DOM API’s • And <a> </a> Link tag addswhile the select element in a form has a bunch of new properties and methods xmldomfall01

  46. Highlights of Event Model in Level 2 DOM • Every Node can have Event Listeners added for types of Event • For example taking mouse events, types are click, mousedown, mouseup, mouseover, mousemove, mouseout xmldomfall01

  47. Sample Event in DOM Level 2 • Here is a MouseEvent • Note you can in DOM both receive events and create them programmatically. This capability was not implemented properly in Netscape 4 – sometimes you could and sometimes you couldn’t xmldomfall01

More Related