270 likes | 319 Views
XBRL Programming 4. XML DOM. 20120119 魏長風 c fwei.tw@gmail.com. XML Data = Tree. Welcome to the TREE world. XML Schema = Tree Data format. W3C XML API. W3C XML API version. XPath 1.0 不是 100% 相容於 XPath 2.0 XPath 1.0 與 XPath 2.0 差異很大 XSLT 1.0 與 XSLT 2.0 差異也很大. 1999. 2007.
E N D
XBRL Programming 4. XML DOM 20120119 魏長風 cfwei.tw@gmail.com
XML Data = Tree • Welcome to the TREE world
W3C XML API version • XPath 1.0 不是 100%相容於 XPath 2.0 • XPath 1.0 與 XPath 2.0 差異很大 • XSLT 1.0 與 XSLT 2.0 差異也很大 1999 2007 XSLT 1.0 XSLT 2.0 XPath 1.0 XPath 2.0 XQuery 1.0
W3C DOM standard • API for all language • DOM Level 1 in 1998 • For HTML and XML • DOM Level 2 in 2000 • getElementById • XML namespace • CSS • DOM Level 3 in 2004 • XPath
W3C DOM – node-tree <?xml version="1.0" encoding="utf-8"?> <html> <body id=“001” color=“002”> Hello <b>world</b> byebye </body> </html> 9Document Text Stored in Text Nodes 7PI ?xml 1Element html 1Element body 2id 001 2color 002 3Text Hello 1Element b 3Text world 3Text byebye
W3C DOM - traverse tree 1Element body firstChild 2id 001 2color 002 3Text Hello Attribute nextSibling 1Element b parent lastChild previousSibling 3Text World 3Text byebye
Space/Enter text node problem <?xml version="1.0" encoding="utf-8"?> <html> <body> Hello <b>world</b> byebye </body> </html> Space/Enter between Elements 9Document 7PI ?xml 3Text “\n “ 1Element html 1Element body 2id 001 2color 002 3Text “\n Hello” 1Element b 3Text world 3Text “\n “ 3Text “byebye\n “
Java DOM - Load XML import javax.xml.parsers.*; import org.w3c.dom.*; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse("xbrl.xml");
C# .Net DOM - Load XML using System; using System.Xml; XmlDocument doc = new XmlDocument(); doc.Load ("xbrl.xml");
example 01 xbrl.xml • 只有一個 context, 一開始就出現 預計要抓的資料
Java DOM – example 01a Node xbrl = doc.getDocumentElement(); Node context = xbrl.getFirstChild(); Node entity = context.getFirstChild(); Element identifier = (Element)entity.getFirstChild(); System.out.println( identifier.getAttribute("scheme") ); // 結果跑不出來!!
Java DOM – example 01b Node xbrl = doc.getDocumentElement() ; Node context = xbrl.getFirstChild(); Node entity = context.getFirstChild(); Element identifier = (Element)entity.getFirstChild(); System.out.println( identifier.getAttribute("scheme") );
Java DOM – example 01c Element identifier = (Element) doc.getDocumentElement().getFirstChild().getFirstChild().getFirstChild(); System.out.println( identifier.getAttribute("scheme") );
Java DOM – example 01d //處理firstChild, 自動跳過 Space/Enter text node static Element getFirstChildElement( Node n ) { Node Child = n.getFirstChild(); while ( Child != null ) { if ( Child.getNodeType() == 1 ) return (Element)Child; else Child = Child.getNextSibling(); } return null; }
Java DOM – example 01d Element xbrl = getFirstChildElement(doc); Element context = getFirstChildElement(xbrl); Element entity = getFirstChildElement(context); Element identifier = getFirstChildElement(entity); System.out.println( identifier.getAttribute("scheme") );
C# .Net DOM – example 01c XmlElement identifier = (XmlElement) doc.DocumentElement.FirstChild.FirstChild.FirstChild; Console.WriteLine( identifier.getAttribute("scheme") ); // .Net 不需要處理換行textNode問題
XPath – What is path? • /xbrl/context/entity/identifier/@scheme 預計要抓的資料
W3C DOM standard • API for all language • DOM Level 1 in 1998 • For HTML and XML • DOM Level 2 in 2000 • getElementById • XML namespace • CSS • DOM Level 3 in 2004 • XPath
Java Xpath – example 03 XPath xpath = XPathFactory.newInstance().newXPath(); String expr = "/xbrl/context/entity/identifier/@scheme"; String scheme = (String)xpath.evaluate(expr, doc, XPathConstants.STRING); System.out.println("scheme= " + scheme );
C# .Net Xpath – example 03 XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable); nsmgr.AddNamespace("xbrli", "http://www.xbrl.org/2003/instance"); XmlNodeList schemaList = doc.SelectNodes( "/xbrli:xbrl/xbrli:context/xbrli:entity/xbrli:identifier/@scheme“, nsmgr); string scheme = (string)schemaList[0].Value; Console.WriteLine("scheme= " + scheme );
example 04 xbrl3.xml • 多個context, 不同的period type 預計要抓的資料
Java XPath – example 04 XPath xpath = XPathFactory.newInstance().newXPath(); NodeList contextSet = (NodeList)xpath.evaluate("/xbrl/context", doc, XPathConstants.NODESET); for( int i=0; i<contextSet.getLength(); i++ ) { Node context = contextSet.item(i); String contextid = (String)xpath.evaluate("@id", context, XPathConstants.STRING); String datevalue1 = (String)xpath.evaluate("period/endDate", context, XPathConstants.STRING); String datevalue2 = (String)xpath.evaluate("period/instant", context, XPathConstants.STRING); if( datevalue1.compareTo("")!=0 ) System.out.println(contextid + "= " + datevalue1 ); if( datevalue2.compareTo("")!=0 ) System.out.println(contextid + "= " + datevalue2 ); }
C# .Net XPath – example 04 XmlNodeList contextSet = doc.SelectNodes("/xbrli:xbrl/xbrli:context", nsmgr); foreach (XmlNode context in contextSet ) { string contextid = (string) context.SelectNodes("@id")[0].Value; XmlNodeList datevalue1list = context.SelectNodes( "xbrli:period/xbrli:endDate",nsmgr) ; XmlNodeList datevalue2list = context.SelectNodes( "xbrli:period/xbrli:instant",nsmgr); if (datevalue1list.Count > 0) { string datevalue1 = datevalue1list[0].FirstChild.Value; if (datevalue1.CompareTo("") != 0) MessageBox.Show(contextid + "= " + datevalue1); } if (datevalue2list.Count > 0) { string datevalue2 = datevalue2list[0].FirstChild.Value; if (datevalue2.CompareTo("") != 0) MessageBox.Show(contextid + "= " + datevalue2); } }