1 / 40

XML Processing Performance Comparison with XPB4J

XML Processing Performance Comparison with XPB4J. July 25, 2002. Pankaj Kumar, Web Services Architect, HP. Agenda. XPB4J: Whys and Whats? XStat Processing How to run XPB4J? -- Show it with a Demo Measurements Parsing/Processing APIs and implementations What are we looking for?

bendek
Download Presentation

XML Processing Performance Comparison with XPB4J

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML Processing Performance Comparison with XPB4J July 25, 2002 Pankaj Kumar, Web Services Architect, HP http://www.pankaj-k.net/xpb4j

  2. Agenda • XPB4J: Whys and Whats? • XStat Processing • How to run XPB4J? -- Show it with a Demo • Measurements • Parsing/Processing APIs and implementations • What are we looking for? • Input Data • Measurement Method • Results • What Next? • How can you benefit ( and contribute )? http://www.pankaj-k.net/xpb4j

  3. Why? • Input for Design and Development • Performance Modeling • Comparing parser/processor performance • Learning XML • Having Fun!! http://www.pankaj-k.net/xpb4j

  4. A different kind of benchmark • A benchmark for developers • Traditional benchmarks are for vendors of systems to be used as sales tool • XPB4J is for developers to study and understand • Performance tradeoffs • Performance modeling • Performance Tuning • Focus on relative numbers • No single metric http://www.pankaj-k.net/xpb4j

  5. Components of XPB4J • Infrastructure ( Java code and Jakarta-Ant scripts ) to run the processing code on input data and report the performance numbers and results. • A framework to plug any XML processing code • A couple of light-weight Java interfaces • A specific processing code -- XStat Processing code http://www.pankaj-k.net/xpb4j

  6. XStat Processing • Collect structural statistics on an XML file • No. of times an element occurred • No. of times it had a particular element as parent • No. of times it had a particular element as child • No. of times it had a particular attribute • Amount of character data it had • Whether the element was empty • Other assumptions • Namespaces ignored. Take qualified names as the element identifiers. • No validation. http://www.pankaj-k.net/xpb4j

  7. How to run XPB4J? • Download it from http://www.pankaj-k.net/xpb4j as a .zip file • Extract it. It creates subdirectory xpb4j-0.90 • Make sure that you have • JDK 1.4.x and JAVA_HOME is set to its base directory • Jakarta-Ant 1.4.x or higher and its bin directory is in PATH. • Issue: ant run • Changing Input Data and other parameters • Changing Parser implementations http://www.pankaj-k.net/xpb4j

  8. XPB4J Demo [ XPB4J Demo ] http://www.pankaj-k.net/xpb4j

  9. What determines processing time? • Processing Activity • Input Data – Type and size of data • Machine ( CPU, RAM, OS, Disk, … ) • JVM implementation • JVM state – Steady, First few executions • Processing API – [SAX, XmlPull], [DOM, JDOM, DOM4J ], XSLT • Parser/Processor implementation http://www.pankaj-k.net/xpb4j

  10. Parsing/Processing APIs and implementations • SAX • JDK 1.4.0, Xerces-2.0.1, GNU JAXP 1.0 beta1, Piccolo 1.02 • XmlPull • XPP3, kXML • DOM • JDK 1.4.0, Xerces, GNU JAXP • JDOM (beta8) • DOM4J 1.3 • XSLT • JDK1 1.4.0, xalan-2.3.1 http://www.pankaj-k.net/xpb4j

  11. Input Data Set Total Size Files DS1 11.9KB res0.xml DS2 98.3KB res.xml DS2 111.7KB res0.xml,…, res9.xml Input Data Search Results from Google’s Web Services API on “Bill Gates”: http://www.pankaj-k.net/xpb4j

  12. Measurement Machine • Self-assembled Server • AMD Athlon 900MHz CPU • 512 MB RAM • Dual boot -- Windows 2000/Mandrake Linux 8.1 http://www.pankaj-k.net/xpb4j

  13. Measurement Loop // Psuedo code. Won’t compile. for (int r = 0; r < runcount; r++) // runcount runs { Runtime.gc(); // Hope that this will force garbage collection. long startMem = Runtime.totalMemory() - Runtime.freeMemory(); long startTime =System.currentTimeMillis(); for (int l = 0; l < loopcount; l++) // loopcount loops { for (file f in input files ) // Do the processing. process f; } long endTime = System.currentTimeMillis(); long endMem = Runtime.totalMemory() - Runtime.freeMemory(); int avgPT = (endTime - startTime)/loopcount; int memU = (endMem - startMem)/1024; System.out.println("Processing Time: " + avgPT + " milli secs."); System.out.println("Memory Use: " + memU + " KB."); } http://www.pankaj-k.net/xpb4j

  14. Questions: #1 • How does performance vary with SAX parsers? • Fixed: • Measurement Machine • JVM – Sun’s JDK1.4.0 • Processing Activity – XStat • Processing API – SAX • JVM State – Steady • Variable: • SAX Parser – JDK1.4, Piccolo 1.02, Xerces 2.0.1, GNUJAXP-Beta1, Xerces 1.4.4 • Input Data – DS1, DS2, DS3 http://www.pankaj-k.net/xpb4j

  15. Results: #1 http://www.pankaj-k.net/xpb4j

  16. Questions: #2 • How does performance vary with DOM parsers? • Fixed: • Measurement Machine • JVM – Sun’s JDK1.4.0 • Processing Activity – XStat • Processing API – DOM • JVM State – Steady • Variable: • DOM Parser – JDK1.4, Xerces 2.0.1, GNUJAXP-Beta1, Xerces 1.4.4 • Input Data – DS1, DS2, DS3 http://www.pankaj-k.net/xpb4j

  17. Results: #2 http://www.pankaj-k.net/xpb4j

  18. Questions: #3 • How does performance vary with XmlPull parsers? • Fixed: • Measurement Machine • JVM – Sun’s JDK1.4.0 • Processing Activity – XStat • Processing API – XmlPull • JVM State – Steady • Variable: • XmlPull Parser – XPP3, kXML • Input Data – DS1, DS2, DS3 http://www.pankaj-k.net/xpb4j

  19. Results: #3 http://www.pankaj-k.net/xpb4j

  20. Questions: #4 • How does performance vary with Memory Tree oriented parsers/processors? • Fixed: • Measurement Machine • JVM – Sun’s JDK1.4.0 • Processing Activity – XStat • Processing API – Memory Tree oriented • JVM State – Steady • Variable: • Parser/Processor – JDK1.4 DOM Parser, JDOM beta8, DOM4J, JDK1.4 XSLT Processor • Input Data – DS1, DS2, DS3 http://www.pankaj-k.net/xpb4j

  21. Results: #4 http://www.pankaj-k.net/xpb4j

  22. Questions: #5 • How does performance compare across best of XmlPull, SAX and DOM parsers? • Fixed: • Measurement Machine • JVM – Sun’s JDK1.4.0 • Processing Activity – XStat • JVM State – Steady • Variable: • Parser/Processor – XPP3, JDK1.4 DOM, JDK1.4 SAX • Input Data – DS1, DS2, DS3 http://www.pankaj-k.net/xpb4j

  23. Results: #5 http://www.pankaj-k.net/xpb4j

  24. Questions: #6 • How does performance vary with JVM? • Fixed: • Measurement Machine • Processing Activity – XStat • JVM State – Steady • Input Data – DS2 • Variable: • Parser/Processor – XPP3, Xerces 1.4.4 • JVM – IBM-JDK1.3, JRockit1.3.1, Sun’s JDK1.3.1, Sun’s JDK1.4 http://www.pankaj-k.net/xpb4j

  25. Results: #6 http://www.pankaj-k.net/xpb4j

  26. Questions: #6 • How does performance vary with JVM warmup? • Fixed: • Measurement Machine • Processing Activity – Xstat • JVM – JDK 1.4.0 • Input Data – DS2 • Variable: • Parser/Processor – XPP3, JDK1.4, JDOM beta8, DOM4J • JVM State – First time, Steady http://www.pankaj-k.net/xpb4j

  27. Results: #7 http://www.pankaj-k.net/xpb4j

  28. Questions: #8 • How does memory use vary with parser/processor? • Fixed: • Measurement Machine • Processing Activity – Xstat • JVM – JDK 1.4.0 • JVM State – Steady • Input Data – DS2 • Variable: • Parser/Processor – XPP3, JDK1.4, JDOM beta8, DOM4J http://www.pankaj-k.net/xpb4j

  29. Results: #8 http://www.pankaj-k.net/xpb4j

  30. Questions: #9 • How does performance vary with input xml filesize? • Fixed: • Measurement Machine • Processing Activity – Xstat • JVM – JDK 1.4.0 • JVM State – Steady • Variable: • Parser/Processor – XPP3, JDK1.4, JDOM beta8, DOM4J • Input Data – 100KB, 1MB, 10MB http://www.pankaj-k.net/xpb4j

  31. Results: #9 http://www.pankaj-k.net/xpb4j

  32. Questions: #10 • How does memory use vary with input xml filesize? • Fixed: • Measurement Machine • Processing Activity – Xstat • JVM – JDK 1.4.0 • JVM State – Steady • Variable: • Parser/Processor – XPP3, JDK1.4, JDOM beta8, DOM4J • Input Data – 100KB, 1MB, 10MB http://www.pankaj-k.net/xpb4j

  33. Results: #10 http://www.pankaj-k.net/xpb4j

  34. Questions: #11 • Any Interesting Observation? • Fixed: • Measurement Machine • Processing Activity – Xstat • JVM – JDK 1.4.0 • JVM State – Steady • Parser/Processor – JDOM beta8 • Input Data – 100KB, 1MB, 10MB • Variable • Node traversal loop – Loop1, Loop2 http://www.pankaj-k.net/xpb4j

  35. Questions: #11 ( Contd. ) Loop1: … List children = elem.getChildren(); for (int i = 0; i < children.size(); i++) collectStat((Element)children.get(i), sc); Loop2: … ListIterator li = children.listIterator(); while (li.hasNext()) collectStat((Element)li.next(), sc); http://www.pankaj-k.net/xpb4j

  36. Results: #11 http://www.pankaj-k.net/xpb4j

  37. Caveats • Different APIs are not perfect substitutes • XSLT processors are significantly different from parsers • Performance should be only one criterion among many others • Xstat is an artificial processing and favors SAX/XmlPull API http://www.pankaj-k.net/xpb4j

  38. What Next? • Comparison with C/C++ Parsers/Processors • Dynamic generation of input data • Framework improvements • Better Reporting and Presentation • More processing activities • Better tuning ?! http://www.pankaj-k.net/xpb4j

  39. How can you benefit ( and contribute )? • Benefit from XPB4J • Gain insight from the report • Learn XML by playing with code • Validate your assumptions • Tune your parser/processor ( if you are an implementer ) • Contribute to XPB4J • Run it under your environment and share your results • Write processing code • Extend the framework • Discussion mailing list is: • xpb4j-users@lists.sourceforge.net http://www.pankaj-k.net/xpb4j

  40. Q & A http://www.pankaj-k.net/xpb4j

More Related