340 likes | 352 Views
Learn how to navigate and manipulate XML files using the Document Object Model (DOM) and Cascading Style Sheets (CSS). Explore the tree structure of XML objects and understand how it applies to parsing and styling XML documents.
E N D
XML DOM and CSS Instructors: Geoffrey Fox and Bryan Carpenter Dept. of Computer Science School of Computational Science and Information Technology 400 Dirac Science LibraryFlorida State UniversityTallahassee Florida 32306-4120 http://www.csit.fsu.edu Nancy McCracken, Ozgur Balsoy http://aspen.csit.fsu.edu/webtech/xml/ it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Document Object Model DOM • We have seen this in parsing XML files where we considered SAX (read file sequentially) or DOM (read in entire structure and navigate through it) parsers • The DOM is an unfortunate name as it is useful whether or not XML defines a document or any other object • Further the OM part of name is confusing as XML defines an object and the DOM describes a different object • DOM is really TOSXO – Tree Object Structure of an XML Object • In fact if you look at almost any structured information, you will find that it has a tree structure and of course we saw that any Schema or DTD defined XML produces a tree it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Example of HTML DOM • Here is an example of a fragment of HTML and how it can be thought of as a tree • This is called a “document fragment” in DOM(lightweight tree) it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
IMS and DOM • As an example, consider recent definition of a object structure for course material from the so called ADL or Advanced Distributed Learning effort from DoD – see http://www.adlnet.org • Here we have a hierarchy with an element called block to define nodes in a tree • This block tree node has various other elements which are specific to this application • Actually in this specification the leaves of tree are <au> tag (assignable unit) which is in fact typically a Web page • So ADL has superimposed a tree for document organization on top of tree for document given by DOM • Of course DOM applies to either tree and describes the way of navigates through it it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
LMS Model used by ADL LearningServer Content Server(s) External systems: “Learning HR, E-Commerce, ERP... Management Course Interchange: System” Course LMS Structure Format (CSF), Metadata Migration Adapter Critical InterchangeCapability Services or Adapter Learning Server Server Adapter Server Side Runtime Client Side Environment: Client Launch, API, Browser Data Model API Adapter Application HTML+ it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Example Tree based Course Structure it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
XML DTD Structure for Block Element it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
The Origins of the W3C DOM • The idea of DOM came from need to be able to build interactive web pages and to identify parts of a document uniquely so that one can for example • Associate a mouse event with a particular page element. • Associate input of text into a form with a particular text are • Dynamical HTML was introduced in Netscape 4 and IE5 and allows one to both associate events with HTML elements and to change the HTML structure • e.g. move a “layer” around within browser • Change text and color in a “document fragment” • Netscape’s implementation of Dynamical HTML had many bugs and was inferior to Microsoft’s although it had the essential needed functionality it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
The 4 levels of DOM • Level 0: Functionality equivalent to that evident in Netscape Navigator 3.0 and Microsoft Internet Explorer 3.0. • Levels 1 and 2 include what is called Dynamical HTML but make this much more complete • Level 1: This concentrates on the general API to an XML document. • It contains functionality for document (tree) navigation and manipulation. • It defines the special case of DOM applied to HTML with specific API’s for the different HTML elements • Level 2: includes a style sheet object model, and defines functionality for manipulating the style information attached to a document. It also enables traversals on the document (i.e. for manipulating collections of nodes) , defines an event model (very important!) and provides support for XML namespaces. • Level 3: Still being developed – see next page it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Level 3 DOM • Level 3, which is at Working Draft stage, includes the following items: • Extending the DOM Level 2 Object Model: Allowing users and applications to access keyboard events. Adding the ability of defining groups of events. • Content Models (DTD, Schema) and Validation: an object model for accessing and modifying a Content Model for a document. • Load and Save interfaces: for loading XML source documents into a DOM representation and for saving a DOM representation as an XML document. • Embedded Document Object Model: Currently, the Web is moving towards documents with mixed markup vocabularies, e.g. SVG fragments can be embedded in an XHTML document. This creates new challenges for the DOM, since it also means that DOM APIs and implementations of the different vocabularies need to work together. • Adaption to changes to core XML functionality: the DOM is an API to an XML document. As auxiliary functionality to XML 1.0 is developed (namespaces, XML Base), the DOM API should model this. • XPath DOM: A simple solution to query a DOM tree using XPath will be also included. it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
What the DOM is not ….. I • Although the Document Object Model was strongly influenced by "Dynamic HTML", in Level 1, it does not implement all of "Dynamic HTML". In particular, events have not yet been defined. Level 1 is designed to lay a firm foundation for this kind of functionality by providing a robust, flexible model of the document itself. • The Document Object Model is not a binary specification. DOM programs written in the same language will be source code compatible across platforms, but the DOM does not define any form of binary interoperability. • The Document Object Model is not a way of persisting objects to XML or HTML. Instead of specifying how objects may be represented in XML, the DOM specifies how XML and HTML documents are represented as objects, so that they may be used in object oriented programs. • The Document Object Model is not a set of data structures, it is an object model that specifies interfaces. Although this document contains diagrams showing parent/child relationships, these are logical relationships defined by the programming interfaces, not representations of any particular internal data structures. it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
What the DOM is not ….. II • The Document Object Model does not define "the true inner semantics" of XML or HTML. The semantics of those languages are defined by W3C Recommendations for these languages. The DOM is a programming model designed to respect these semantics. The DOM does not have any ramifications for the way you write XML and HTML documents; any document that can be written in these languages can be represented in the DOM. • The Document Object Model, despite its name, is not a competitor to the Component Object Model (COM). COM, like CORBA, is a language independent way to specify interfaces and objects; the DOM is a set of interfaces and objects designed for managing HTML and XML documents. The DOM may be implemented using language-independent systems like COM or CORBA; it may also be implemented using language-specific bindings like the Java or ECMAScript bindings specified in this document. it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Language Bindings • The DOM specifies a set of methods and properties which are the interface that for user to access the static or dynamic (events) of an XML structure. It also allows one to create or modify such structures • In specification it gives this interface for IDL (CORBA), Java and ECMAScript • For Web Pages, Java (in Java Server Pages) or ECMAScript are most important • ECMAScript is a general object based scripting language • ECMAScript plus the DOM bindings is essentially JavaScript • Of course Netscape 4 and IE5 do not follow (exactly) the W3C DOM • Mozilla (Netscape 6) http://www.mozilla.org/js/ does support the W3C DOM Interface – fully at level 1 and partially at level 2 it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Netscape 6 and Level 1 DOM • Note that Netscape 6 supports XML • This comes from http://home.netscape.com/browsers/future/standards.html • In Netscape 6 and Mozilla “everything” (Web page and Browser adornments) are controlled by DOM interface it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
DOM Level 1Core Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node • In the DOM, one builds a tree out of a set of Node objects • Each Node object has a set of generic capabilities (properties and methods) and also implements specific interfaces. In the CORE one defines a set of Node types to reflect the structure of XML. Each Node type has its own interface to reflects its special features. …….. etc. it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Node Types in Level 1 Core I • For each Node Type, we give the allowed children • Document -- Element (maximum of one), ProcessingInstruction, Comment, DocumentType • DocumentFragment-- Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference • DocumentType -- no children • EntityReference -- Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference • Element -- Element, Text, Comment, ProcessingInstruction, CDATASection, EntityReference it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Node Types in Level 1 Core II • Attr -- Text, EntityReference • ProcessingInstruction -- no children • Comment -- no children • Text -- no children • CDATASection -- no children • Entity-- Element, ProcessingInstruction, Comment, Text, CDATASection, EntityReference • Notation -- no children it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
The Node Interface in CORBA IDL Constants Properties Methods it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
nodeName nodeValue attributes • Each Node type has particular rules for values of some of the properties – most importantly nodeName and nodeValue • attributes is property only allowed for an element document type Node Type it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Document Fragment Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node • This is a lightweight “document” used to denote a part of a Tree. As it does not carry all the overhead of an XML object instance, it is a convenient way of denoting a sub tree including all leaf nodes below a certain internal node. • This is an important building block for documents Node Document Fragment Node it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
This page is full of Documents Fragments such as or it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Properties of a Node I • nodeName • The name of this node, depending on its type; see the table above. • nodeValue • The value of this node, depending on its type; see the table above. • Exceptions on setting: DOMException • NO_MODIFICATION_ALLOWED_ERR: Raised when the node is readonly. • Exceptions on retrieval: DOMException • DOMSTRING_SIZE_ERR: Raised when it would return more characters than fit in a DOMString variable on the implementation platform. • nodeType • A code representing the type of the underlying object, as defined above. it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Properties of a Node II • parentNode • The parent of this node. All nodes, except Document, DocumentFragment, and Attr may have a parent. However, if a node has just been created and not yet added to the tree, or if it has been removed from the tree, this is null. • childNodes • A NodeList that contains all children of this node. If there are no children, this is a NodeList containing no nodes. The content of the returned NodeList is "live" in the sense that, for instance, changes to the children of the node object that it was created from are immediately reflected in the nodes returned by the NodeList accessors; it is not a static snapshot of the content of the node. This is true for every NodeList, including the ones returned by the getElementsByTagName method. • firstChild • The first child of this node. If there is no such node, this returns null. • lastChild • The last child of this node. If there is no such node, this returns null. it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Properties of a Node III • previousSibling • The node immediately preceding this node. If there is no such node, this returns null. • nextSibling • The node immediately following this node. If there is no such node, this returns null. • attributes • A NamedNodeMap containing the attributes of this node (if it is an Element) or null otherwise. • ownerDocument • The Document object associated with this node. This is also the Document object used to create new nodes. When this node is a Document this is null. it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Methods of a Node I • insertBefore (newChild, refChild) • Inserts the node newChild before the existing child node refChild. If refChild is null, insert newChild at the end of the list of children. • If newChild is a DocumentFragment object, all of its children are inserted, in the same order, before refChild. If the newChild is already in the tree, it is first removed. • replaceChild (newChild, oldChild) • Replaces the child node oldChild with newChild in the list of children, and returns the oldChild node. If the newChild is already in the tree, it is first removed. it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Methods of a Node II • removeChild (oldChild) • Removes the child node indicated by oldChild from the list of children, and returns it. • appendChild (newChild) • Adds the node newChild to the end of the list of children of this node. If the newChild is already in the tree, it is first removed. • hasChildNodes • This is a convenience method to allow easy determination of whether a node has any children. • It returns true if there are any Child Nodes it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Methods of a Node III • cloneNode (deep) • Returns a duplicate of this node, i.e., serves as a generic copy constructor for nodes. The duplicate node has no parent (parentNode returns null.). • Cloning an Element copies all attributes and their values, including those generated by the XML processor to represent defaulted attributes, but this method does not copy any text it contains unless it is a deep clone, since the text is contained in a child Text node. Cloning any other type of node simply returns a copy of this node. • Parameter deep: If true, recursively clone the subtree under the specified node; if false, clone only the node itself (and its attributes, if it is an Element). it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Two Specific Interfaces • DocumentFragment: • And Document it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
HTML Level 1 DOM • This has several extensions basically inheriting the XML Interfaces of Core to specialize to each HTML tag • An HTMLDocument interface, derived from the core Document interface. HTMLDocument specifies the operations and queries that can be made on a HTML document. • An HTMLElement interface, derived from the core Element interface. HTMLElement specifies the operations and queries that can be made on any HTML element. Methods on HTMLElement include those that allow for the retrieval and modification of attributes that apply to all HTML elements. • Specializations for all HTML elements that have attributes that extend beyond those specified in the HTMLElement interface. For all such attributes, the derived interface for the element contains explicit methods for setting and getting the values. it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
HTMLDocument Interface • This uses another special interface data structure HTMLCollection to hold lists of sub-components it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
HTMLElement and Specializations • Any HTML Element adds to Node The <body> tag adds it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Two HTML DOM API’s • And <a> </a> Link tag addswhile the select element in a form has a bunch of new properties and methods it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Highlights of Event Model in Level 2 DOM • Every Node can have Event Listeners added for types of Event • For example taking mouse events, types are click, mousedown, mouseup, mouseover, mousemove, mouseout it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01
Sample Event in DOM Level 2 • Here is a MouseEvent • Note you can in DOM both receive events and create them programmatically. This capability was not implemented properly in Netscape 4 – sometimes you could and sometimes you couldn’t it2xmldomcss01 http://aspen.csit.fsu.edu/it2spring01