1 / 38

3.2 Document Object Model (DOM)

3.2 Document Object Model (DOM). How access structured documents uniformly in parsers, browsers, editors, databases,...? Overview of the W3C DOM Spec Level 1, W3C Rec , Oct. 1998 Level 2 , W3C Rec , Nov. 2000 Level 3 Validation , Core , and Load and Save W3C Recs (Spring 2004)

erol
Download Presentation

3.2 Document Object Model (DOM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3.2 Document Object Model (DOM) • How access structured documents uniformly in parsers, browsers, editors, databases,...? • Overview of the W3C DOM Spec • Level 1, W3C Rec, Oct. 1998 • Level 2, W3C Rec, Nov. 2000 • Level 3 Validation, Core, and Load and SaveW3C Recs (Spring 2004) W3C DOM Activity has been closed 3.2: Document Object Model

  2. DOM: What is it? • An object-based, language-neutral API for XML and HTML documents • Allows programs and scripts to build, access, and modify documents • Supports designing of querying, filtering, transformation, formatting etc. applications on top of DOM implementations • Instead of “Serial Access XML” could think as “Directly Obtainable in Memory” 3.2: Document Object Model

  3. DOM structure model • Based on O-O concepts: • objects (encapsulation of data and methods) • methods (to access or change object’s state) • interfaces (declaration of a set of methods) • Somewhat similar to the XPath data model (to be discussed with XSLT and XQuery)  syntax-tree • Tree structure implied by abstract relationships defined by the API; Data structures of an implementation may differ 3.2: Document Object Model

  4. <invoice form="00" type="estimated"> <addressdata> <name>John Doe</name> <address> <streetaddress>Pyynpolku 1 </streetaddress> <postoffice>70460 KUOPIO </postoffice> </address> </addressdata> ... form="00" type="estimated" invoice ... addressdata address name Document streetaddress postoffice John Doe Element Pyynpolku 1 70460 KUOPIO Text NamedNodeMap DOM structure model 3.2: Document Object Model

  5. Structure of DOM Level 1 I: DOM Core Interfaces • Fundamental interfaces • basic interfaces: Document, Element, Attr, Text, ... • "Extended" (XML specific) interfaces • CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstruction II: DOM HTML Interfaces • more convenient access to HTML documents • we'll ignore these 3.2: Document Object Model

  6. DOM Level 2 • Level 1: basic representation and manipulation of document structure and content (No access to the contents of a DTD) • DOM Level 2 adds • support for namespaces • Document.getElementById("id_val"), to access elements by ID attr values • optional features (we’ll skip these) • interfaces to document views and style sheets • an event model (for user actions on elements) • methods for traversing the document tree and manipulating regions of document (e.g., selected in an editor) 3.2: Document Object Model

  7. DOM Language Bindings • Language-independence: • DOM interfaces are defined using OMG Interface Definition Language (IDL, defined in Corba Specification) • Language bindings (implementations of interfaces) defined in the Recommendation for • Java (See the Java API doc) and • ECMAScript (standardised JavaScript) 3.2: Document Object Model

  8. Core Interfaces: Node & its variants Node Document DocumentFragment Element Attr CharacterData “Extended interfaces” Comment Text CDATASection DocumentType Notation Entity EntityReference ProcessingInstruction 3.2: Document Object Model

  9. Node getNodeType, getNodeName, getNodeValue getOwnerDocument getParentNode hasChildNodes, getChildNodes getFirstChild, getLastChild getPreviousSibling, getNextSibling hasAttributes, getAttributes appendChild(newChild) insertBefore(newChild,refChild) replaceChild(newChild,oldChild) removeChild(oldChild) Document Element Text NamedNodeMap DOM interfaces: Node form="00" type="estimatedbill" invoice ... addressdata name address John Doe streetaddress postoffice Pyynpolku 1 70460 KUOPIO 3.2: Document Object Model

  10. Type and Name of aNode • node.getNodeType():short intconstants 1, 2, …, 12 forNode.ELEMENT_NODE,Node.ATTRIBUTE_NODE,Node.TEXT_NODE, … • node.getNodeName() • for an Element = element.getTagName() • for an Attr: the name of the attribute • for anonymous nodes: "#text", "#document", "#comment" etc 3.2: Document Object Model

  11. The Value of aNode • node.getNodeValue() • content of a text node, value of attribute, …; null for an Element(Notice !) • (C.f. XPath, where node’s value is its full textual content) • DOM 3 provides full text content with methodnode.getTextContent() 3.2: Document Object Model

  12. Object Creation in DOM • Each DOM Node n belongs to aDocument: n.getOwnerDocument() • Objects that implement interface X are created by factory methodsDocument.createX(…)E.g: when doc is aDocumentobject doc.createElement("A"), doc.createAttribute("href"), doc.createTextNode("Hello!") • Loading & saving specified in DOM3 (or implementation-specific , or via JAXP) 3.2: Document Object Model

  13. Document Element Text NamedNodeMap Node DOM interfaces: Document Document getDocumentElement getElementById(IdVal) getElementsByTagName(tagName) createElement(tagName) createTextNode(data) form="00" type="estimated" invoice ... addressdata address name streetaddress postoffice John Doe Pyynpolku 1 70460 KUOPIO 3.2: Document Object Model

  14. Document Element Text NamedNodeMap Node DOM interfaces: Element Element getTagName() hasAttribute(name) getAttribute(name) setAttribute(attrName, value) removeAttribute(name) getElementsByTagName(name) invoice form="00" type="estimatedbill" invoicepage addressee addressdata name address John Doe streetaddress postoffice 3.2: Document Object Model Pyynpolku 1 70460 KUOPIO

  15. Text Content Manipulation in DOM • for objects c that implement the CharacterDatainterface (Text, Comments, CDATASections): • c.substringData(offset, count) • c.appendData(string) • c.insertData(offset, string) • c.deleteData(offset, count) • c.replaceData(offset, count, string)( = c.deleteData(offset, count);c.insertData(offset, string) ) 3.2: Document Object Model

  16. DOMCharacterData • DOM strings are 0-based sequences of 16-bit characters: C: Hello world, nice to see you! 0 1 2 01234567890123456789012345678 C.getLength()-1 C.substringData(6, 5) = ? C.substringData(0, C.getLength()) = ? 3.2: Document Object Model

  17. Interfaces to node collections (1) • NodeListfor ordered lists of nodes <- Node.getChildNodes()and Element/Document.getElementsByTagName("name") • (proper) descendant elements of type "name" in document order ("*" ~ any element type) 1 E .getElementsByTagName(“E")= 2 3 4 E A E 5 A 6 E 3.2: Document Object Model

  18. Typical child-node access pattern • Accessing specific nodes, or iterating over a NodeList: • to process all children of node:for (i=0;i<node.getChildNodes().getLength(); i++) process(node.getChildNodes().item(i)); 3.2: Document Object Model

  19. Interfaces to node collections (2) • NamedNodeMap for unordered sets of nodes accessed by their name: <- Node.getAttributes(), DocumentType.getEntities() • DocumentFragment • Termporary container of child nodes • Disappears when inserted in tree • NodeLists and NamedNodeMaps are "live": • reflect updates of the doc tree immediately • See next 3.2: Document Object Model

  20. NodeLists are “live” • E.g., this would delete every other child of n:NodeListcList = n.getChildNodes();for (i=0; i<cList.getLength(); i++) n.removeChild(cList.item(i)); • What happens? n cList A B C D i=0 i=1 i=2 3.2: Document Object Model

  21. DOM: XML Implementations • Java-based parsers e.g. Apache Xerces, Apache Crimson, … • In MS IE browser: COM programming interfaces for C/C++ and Visual Basic; ActiveX object programming interfaces for script languages • Perl: XML::DOM (Implements DOM Level 1) • Others, say, database APIs? • Vendors of different kinds of systems participated in the W3C DOM WG 3.2: Document Object Model

  22. Document loaded succesfully > list the contents A Java-DOM Example • Command-line tool RegListMgrfor maintaining a course registration list • with single-letter commands for listing, adding, updating and deleting student records • Example: $ java RegListMgr reglist.xml l … 40: Tero Ulvinen, TKM1, tero@fake.addr.fi, 241: heli viinikainen, tkt5, heli@fake.addr.fi, 1 3.2: Document Object Model

  23. Registration list: the XML file <?xml version="1.0" ?> <!DOCTYPE reglist SYSTEM "reglist.dtd"> <reglist lastID="41"> <student id="RDK1"> <name><given>Juho</given> <family>Ahopelto</family></name> <branchAndYear>TKT4</branchAndYear> <email>juho@fake.addr.fi</email> <group>2</group> </student> <!-- … and the otherstudents … --> </reglist> 3.2: Document Object Model

  24. Registration List: the DTD <!ELEMENT reglist (student*)> <!ATTLIST reglist lastID CDATA #REQUIRED > <!ELEMENT student (name, branchAndYear, email, group)> <!ATTLIST student id ID #REQUIRED > <!ELEMENT name (given, family)> <!ELEMENT given (#PCDATA)> <!-- … and the same for family, branchAndYear, email,and group --> 3.2: Document Object Model

  25. Loading and Saving the RegList • Loading of the registration list into DOMDocumentdoc implemented with a JAXP DocumentBuilder • (to be discussed later) • doc is a handle to the Document • Saving implemented with a JAXP Transformer • to be discussed later 3.2: Document Object Model

  26. Listing student records (1) NodeList students = doc.getElementsByTagName("student"); for (int i=0; i<students.getLength(); i++) showStudent((Element) students.item(i)); private void showStudent(Element student) { // Collect relevant sub-elements: Node given = student.getElementsByTagName("given").item(0); Node family = given.getNextSibling(); Node bAndY = student. getElementsByTagName("branchAndYear").item(0); Node email = bAndY.getNextSibling(); Node group = email.getNextSibling(); 3.2: Document Object Model

  27. Listing student records (2) // Method showStudent continues: System.out.print( student.getAttribute("id").substring(3)); System.out.print(": " + given.getFirstChild().getNodeValue() ); // or given.getTextContent() with DOM3 // .. similarly access and display the // value of family, bAndY, email, and group// … } // showStudent 3.2: Document Object Model

  28. Lessons of accessing DOM • Access methods for relevant nodes • getElementsByTagname(“tagName”) • robust wrt structure modifications • Also others, if structure known (validated) • getFirstChild(), getLastChild(), getPreviousSibling(), getNextSibling() • Element nodes have no value! • Get the value from child Text nodes, or use getTextContent() 3.2: Document Object Model

  29. addstudents Antti Last name: Ahkera Branch&year: tkt3 email: antti@fake.addr.fi group: 2 First name (or <return> to finish): Finished adding records > Adding New Records • Example: > a First name (or <return> to finish): l … 41: heli viinikainen, tkt5, heli@fake.addr.fi, 1 42: Antti Ahkera, tkt3, antti@fake.addr.fi, 2 3.2: Document Object Model

  30. Implementing addition of records (1) Element rootElem = doc.getDocumentElement(); String lastID = rootElem.getAttribute("lastID"); int lastIDnum = java.lang.Integer.parseInt(lastID); System.out.print( "First name (or <return> to finish): "); String firstName = terminalReader.readLine().trim(); while (firstName.length() > 0) { // Get the next unused ID: ID = "RDK" + new Integer(++lastIDnum).toString(); // … Read values lastName, bAndY, email, // and group from the terminal, and then ... 3.2: Document Object Model

  31. Implementing addition of records (2) Element newStudent = newStudent(doc, ID, firstName, lastName, bAndY, email, group); rootElem.appendChild(newStudent); System.out.print( "First name (or <return> to finish): "); firstName = terminalReader.readLine().trim(); } // while firstName.length() > 0 // Update the last ID used: String newLastID = java.lang.Integer.toString(lastIDnum); rootElem.setAttribute("lastID", newLastID); System.out.println("Finished adding records"); 3.2: Document Object Model

  32. Creating new student records (1) private Element newStudent(Document doc, String ID, String fName, String lName, String bAndY, String email, String grp) { Element stu = doc.createElement("student"); stu.setAttribute("id", ID); Element newName = doc.createElement("name"); Element newGiven = doc.createElement("given"); newGiven.appendChild(doc.createTextNode(fName)); Element newFamily = doc.createElement("family"); newFamily.appendChild(doc.createTextNode(lName)); newName.appendChild(newGiven); newName.appendChild(newFamily); stu.appendChild(newName); 3.2: Document Object Model

  33. Creating new student records (2) // method newStudent(…) continues:Element newBr = doc.createElement("branchAndYear"); newBr.appendChild(doc.createTextNode(bAndY)); stu.appendChild(newBr); Element newEmail = doc.createElement("email"); newEmail.appendChild(doc.createTextNode(email)); stu.appendChild(newEmail); Element newGrp = doc.createElement("group"); newGrp.appendChild(doc.createTextNode(group)); stu.appendChild(newGrp); return stu; } // newStudent 3.2: Document Object Model

  34. Lessons of modifying DOM • Each node must be created with • Document.create...(“nameOrValue”) • Attributes of an element more easily with setAttribute(“name”, “value”) • ... and connected to the structure • Normally with parent.appendChild(newChild) • Updates and deletions in the RegListMgr similarly, by manipulating the DOM structures • -> exercises 3.2: Document Object Model

  35. Efficiency of SAX vs DOM? • DOM has reputation of requiring more resources than streaming interfaces like SAX • Small experiment of this hypothesis: • Test task: Retrieve the title of the last section that mentions "XML Schema definition language" • Target docs: repeats of fragments from W3C XML Schema Recommendation (Part 1) • Environment: JDK 1.6, Red Hat Linux 6, 3 GHz Pentium with 1 GB RAM 3.2: Document Object Model

  36. The speed of DOM vs SAX • On small documents, up to ~ 2 MB, the SAX & DOM based solutions are roughly equal: ~ 3.0 MB/s ~ 3.9 MB/s 3.2: Document Object Model

  37. Resource needs of DOM vs SAX • On larger documents, up to ~ 60 MB, the DOM application becomes faster than SAX(!) • throughput ~ 8 MB/s • SAX ~ 4 MB/s • But DOM takes relatively much of RAM • here ~ 6 x the size of the input XML document • The SAX application runs in fixed space of ~ 6 MB 3.2: Document Object Model

  38. Summary of XML APIs so far • Give applications access to the structure and contents of XML documents • Event-based APIs (e.g. SAX) • notify application through parsing events • efficient • Object-model (or tree) based APIs (e.g. DOM) • provide a full parse tree • more convenient, but require much resources with large documents • Major parsers support both SAX and DOM • used through proprietary methods • used through JAXP (-> next) 3.2: Document Object Model

More Related