J2EE —— 第 6 章 Document Object Model (DOM)

J2EE——第6章 Document Object Model (DOM)

文档与数据 • DOM, JDOM, dom4j • 区别 • 存在于层次结构中的节点的种类 • “混和内容”的能力

混和内容模型 <sentence>This is an <bold>important</bold> idea.</sentence> ELEMENT: sentence + TEXT: This is an + ELEMENT: bold + TEXT: important + TEXT: idea 第一个节点：nodeName()=sentence nodeValue()=null 第二个节点：nodeName()=#text nodeValue()=This is an

更简单的模型 • DOM(org.w3c.dom.Element) element.getChildNodes().item(0).getNodeValue(); • JDOM, dom4j(org.jdom.Element, org.dom4j.Element) element.getText(); <addressbook> <entry>Fred <email>fred@home</email> </entry> ... </addressbook>

增加复杂性 <sentence> The &projectName; <![CDATA[<i>project</i>]]> is <?editor: red?><bold>important</bold><?editor: normal?>. </sentence> + ELEMENT: sentence + TEXT: The + ENTITY REF: projectName + COMMENT: The latest name we're using + TEXT: Eagle + CDATA: <i>project</i> + TEXT: is + PI: editor: red + ELEMENT: bold + TEXT: important + PI: editor: normal

健壮的DOM程序的必要任务 • 在搜索元素时 • 忽略注释、属性和处理指令 • 允许子元素不以预期的顺序出现 • 跳过哪些包含可忽略的空格的TEXT节点 • 在为节点提取文本时 • 从CDATA节点提取文本，就像从文本节点提取文本一样 • 在收集文本时，忽略注释、属性和处理指令 • 如果碰到了实体引用节点或其他元素节点，则进行递归（即，将文本提取过程施加于所有子节点）

选择模型 • DOM • 成熟的文档 • 复杂的应用程序 • 使用Schema • 同时处理文档和数据 • JDOM, dom4j • 简单的数据类型

将XML读到DOM中 import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.FactoryConfigurationError; import javax.xml.parsers.ParserConfigurationException; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import java.io.File; import java.io.IOException; import org.w3c.dom.Document; import org.w3c.dom.DOMException; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); document = builder.parse( new File(argv[0]) );

其它信息 factory.setValidating(true); factory.setNamespaceAware(true); builder.setErrorHandler( new org.xml.sax.ErrorHandler() { // ignore fatal errors (an exception is guaranteed) public void fatalError(SAXParseException exception) throws SAXException { } // treat validation errors as fatal public void error(SAXParseException e) throws SAXParseException { throw e; } } );

创建适配器显示JTree中的DOM public class AdapterNode { org.w3c.dom.Node domNode; public AdapterNode(org.w3c.dom.Node node) { domNode = node; } public String toString() { String s = typeName[domNode.getNodeType()]; String nodeName = domNode.getNodeName(); if (! nodeName.startsWith("#")) { s += ": " + nodeName; } if (domNode.getNodeValue() != null) { if (s.startsWith("ProcInstr")) s += ", "; else s += ": "; String t = domNode.getNodeValue().trim(); int x = t.indexOf("\n"); if (x >= 0) t = t.substring(0, x); s += t; } return s; } }

检查DOM的结构

DOCTYPE节点

处理指令节点

实体引用

factory.setExpandEntityReference(false)

CDATA节点

词汇控制 • setCoalescing():将CDATA段转换为Text节点，并追加到一个相邻的Text节点 • setExpandEntityReference():扩展实体引用节点 • setIgnoringComments():忽略注释 • setIgnoringElementContentWhiteSpace():忽略并非元素内容的重要组成部分的那些空格

创建DOM DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); document = builder.newDocument(); Element root = (Element) document.createElement("rootElement"); document.appendChild(root); root.appendChild(document.createTextNode("Some") ); root.appendChild(document.createTextNode(" ") ); root.appendChild(document.createTextNode("text") ); document.getDocumentElement().normalize();

遍历节点 • getFirstChild • getLastChild • getNextSibling • getPreviousSibling • getParentNode

搜索节点 public Node findSubNode(String name, Node node) { if (node.getNodeType() != Node.ELEMENT_NODE) { System.err.println("Error: Search node not of element type"); System.exit(22); } if (! node.hasChildNodes()) return null; NodeList list = node.getChildNodes(); for (int i=0; i < list.getLength(); i++) { Node subnode = list.item(i); if (subnode.getNodeType() == Node.ELEMENT_NODE) { if (subnode.getNodeName().equals(name)) return subnode; } } return null; }

获取节点内容 public String getText(Node node) { StringBuffer result = new StringBuffer(); if (! node.hasChildNodes()) return ""; NodeList list = node.getChildNodes(); for (int i=0; i < list.getLength(); i++) { Node subnode = list.item(i); if (subnode.getNodeType() == Node.TEXT_NODE) { result.append(subnode.getNodeValue()); } else if (subnode.getNodeType() == Node.CDATA_SECTION_NODE) { result.append(subnode.getNodeValue()); } else if (subnode.getNodeType() == Node.ENTITY_REFERENCE_NODE) { result.append(getText(subnode)); } } return result.toString(); }

其它操作 • 创建属性：setAttribute • 删除节点：removeChild • 修改节点：replaceChild, setNodeValue

用XML Schema验证 static final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage"; static final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema"; factory.setNamespaceAware(true); factory.setValidating(true); try { factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA); } catch (IllegalArgumentException x) { }

将文档与Schema关联起来 <documentRoot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=‘YourSchemaDefinition.xsd'> static final String schemaSource = "YourSchemaDefinition.xsd"; static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource"; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() factory.setAttribute(JAXP_SCHEMA_SOURCE, new File(schemaSource));

用多个名称空间验证 <documentRoot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="employeeDatabase.xsd" xsi:schemaLocation= "http://www.irs.gov/ fullpath/w2TaxForm.xsd http://www.ourcompany.com/ relpath/hiringForm.xsd" xmlns:tax="http://www.irs.gov/" xmlns:hiring="http://www.ourcompany.com/"> static final String employeeSchema = "employeeDatabase.xsd"; static final String taxSchema = "w2TaxForm.xsd"; static final String hiringSchema = "hiringForm.xsd"; static final String[] schemas = { employeeSchema, taxSchema, hiringSchema,}; static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource"; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() factory.setAttribute(JAXP_SCHEMA_SOURCE, schemas);

J2EE —— 第 6 章 Document Object Model (DOM)