400 likes | 549 Views
XML Processing Performance Comparison with XPB4J. July 25, 2002. Pankaj Kumar, Web Services Architect, HP. Agenda. XPB4J: Whys and Whats? XStat Processing How to run XPB4J? -- Show it with a Demo Measurements Parsing/Processing APIs and implementations What are we looking for?
E N D
XML Processing Performance Comparison with XPB4J July 25, 2002 Pankaj Kumar, Web Services Architect, HP http://www.pankaj-k.net/xpb4j
Agenda • XPB4J: Whys and Whats? • XStat Processing • How to run XPB4J? -- Show it with a Demo • Measurements • Parsing/Processing APIs and implementations • What are we looking for? • Input Data • Measurement Method • Results • What Next? • How can you benefit ( and contribute )? http://www.pankaj-k.net/xpb4j
Why? • Input for Design and Development • Performance Modeling • Comparing parser/processor performance • Learning XML • Having Fun!! http://www.pankaj-k.net/xpb4j
A different kind of benchmark • A benchmark for developers • Traditional benchmarks are for vendors of systems to be used as sales tool • XPB4J is for developers to study and understand • Performance tradeoffs • Performance modeling • Performance Tuning • Focus on relative numbers • No single metric http://www.pankaj-k.net/xpb4j
Components of XPB4J • Infrastructure ( Java code and Jakarta-Ant scripts ) to run the processing code on input data and report the performance numbers and results. • A framework to plug any XML processing code • A couple of light-weight Java interfaces • A specific processing code -- XStat Processing code http://www.pankaj-k.net/xpb4j
XStat Processing • Collect structural statistics on an XML file • No. of times an element occurred • No. of times it had a particular element as parent • No. of times it had a particular element as child • No. of times it had a particular attribute • Amount of character data it had • Whether the element was empty • Other assumptions • Namespaces ignored. Take qualified names as the element identifiers. • No validation. http://www.pankaj-k.net/xpb4j
How to run XPB4J? • Download it from http://www.pankaj-k.net/xpb4j as a .zip file • Extract it. It creates subdirectory xpb4j-0.90 • Make sure that you have • JDK 1.4.x and JAVA_HOME is set to its base directory • Jakarta-Ant 1.4.x or higher and its bin directory is in PATH. • Issue: ant run • Changing Input Data and other parameters • Changing Parser implementations http://www.pankaj-k.net/xpb4j
XPB4J Demo [ XPB4J Demo ] http://www.pankaj-k.net/xpb4j
What determines processing time? • Processing Activity • Input Data – Type and size of data • Machine ( CPU, RAM, OS, Disk, … ) • JVM implementation • JVM state – Steady, First few executions • Processing API – [SAX, XmlPull], [DOM, JDOM, DOM4J ], XSLT • Parser/Processor implementation http://www.pankaj-k.net/xpb4j
Parsing/Processing APIs and implementations • SAX • JDK 1.4.0, Xerces-2.0.1, GNU JAXP 1.0 beta1, Piccolo 1.02 • XmlPull • XPP3, kXML • DOM • JDK 1.4.0, Xerces, GNU JAXP • JDOM (beta8) • DOM4J 1.3 • XSLT • JDK1 1.4.0, xalan-2.3.1 http://www.pankaj-k.net/xpb4j
Input Data Set Total Size Files DS1 11.9KB res0.xml DS2 98.3KB res.xml DS2 111.7KB res0.xml,…, res9.xml Input Data Search Results from Google’s Web Services API on “Bill Gates”: http://www.pankaj-k.net/xpb4j
Measurement Machine • Self-assembled Server • AMD Athlon 900MHz CPU • 512 MB RAM • Dual boot -- Windows 2000/Mandrake Linux 8.1 http://www.pankaj-k.net/xpb4j
Measurement Loop // Psuedo code. Won’t compile. for (int r = 0; r < runcount; r++) // runcount runs { Runtime.gc(); // Hope that this will force garbage collection. long startMem = Runtime.totalMemory() - Runtime.freeMemory(); long startTime =System.currentTimeMillis(); for (int l = 0; l < loopcount; l++) // loopcount loops { for (file f in input files ) // Do the processing. process f; } long endTime = System.currentTimeMillis(); long endMem = Runtime.totalMemory() - Runtime.freeMemory(); int avgPT = (endTime - startTime)/loopcount; int memU = (endMem - startMem)/1024; System.out.println("Processing Time: " + avgPT + " milli secs."); System.out.println("Memory Use: " + memU + " KB."); } http://www.pankaj-k.net/xpb4j
Questions: #1 • How does performance vary with SAX parsers? • Fixed: • Measurement Machine • JVM – Sun’s JDK1.4.0 • Processing Activity – XStat • Processing API – SAX • JVM State – Steady • Variable: • SAX Parser – JDK1.4, Piccolo 1.02, Xerces 2.0.1, GNUJAXP-Beta1, Xerces 1.4.4 • Input Data – DS1, DS2, DS3 http://www.pankaj-k.net/xpb4j
Results: #1 http://www.pankaj-k.net/xpb4j
Questions: #2 • How does performance vary with DOM parsers? • Fixed: • Measurement Machine • JVM – Sun’s JDK1.4.0 • Processing Activity – XStat • Processing API – DOM • JVM State – Steady • Variable: • DOM Parser – JDK1.4, Xerces 2.0.1, GNUJAXP-Beta1, Xerces 1.4.4 • Input Data – DS1, DS2, DS3 http://www.pankaj-k.net/xpb4j
Results: #2 http://www.pankaj-k.net/xpb4j
Questions: #3 • How does performance vary with XmlPull parsers? • Fixed: • Measurement Machine • JVM – Sun’s JDK1.4.0 • Processing Activity – XStat • Processing API – XmlPull • JVM State – Steady • Variable: • XmlPull Parser – XPP3, kXML • Input Data – DS1, DS2, DS3 http://www.pankaj-k.net/xpb4j
Results: #3 http://www.pankaj-k.net/xpb4j
Questions: #4 • How does performance vary with Memory Tree oriented parsers/processors? • Fixed: • Measurement Machine • JVM – Sun’s JDK1.4.0 • Processing Activity – XStat • Processing API – Memory Tree oriented • JVM State – Steady • Variable: • Parser/Processor – JDK1.4 DOM Parser, JDOM beta8, DOM4J, JDK1.4 XSLT Processor • Input Data – DS1, DS2, DS3 http://www.pankaj-k.net/xpb4j
Results: #4 http://www.pankaj-k.net/xpb4j
Questions: #5 • How does performance compare across best of XmlPull, SAX and DOM parsers? • Fixed: • Measurement Machine • JVM – Sun’s JDK1.4.0 • Processing Activity – XStat • JVM State – Steady • Variable: • Parser/Processor – XPP3, JDK1.4 DOM, JDK1.4 SAX • Input Data – DS1, DS2, DS3 http://www.pankaj-k.net/xpb4j
Results: #5 http://www.pankaj-k.net/xpb4j
Questions: #6 • How does performance vary with JVM? • Fixed: • Measurement Machine • Processing Activity – XStat • JVM State – Steady • Input Data – DS2 • Variable: • Parser/Processor – XPP3, Xerces 1.4.4 • JVM – IBM-JDK1.3, JRockit1.3.1, Sun’s JDK1.3.1, Sun’s JDK1.4 http://www.pankaj-k.net/xpb4j
Results: #6 http://www.pankaj-k.net/xpb4j
Questions: #6 • How does performance vary with JVM warmup? • Fixed: • Measurement Machine • Processing Activity – Xstat • JVM – JDK 1.4.0 • Input Data – DS2 • Variable: • Parser/Processor – XPP3, JDK1.4, JDOM beta8, DOM4J • JVM State – First time, Steady http://www.pankaj-k.net/xpb4j
Results: #7 http://www.pankaj-k.net/xpb4j
Questions: #8 • How does memory use vary with parser/processor? • Fixed: • Measurement Machine • Processing Activity – Xstat • JVM – JDK 1.4.0 • JVM State – Steady • Input Data – DS2 • Variable: • Parser/Processor – XPP3, JDK1.4, JDOM beta8, DOM4J http://www.pankaj-k.net/xpb4j
Results: #8 http://www.pankaj-k.net/xpb4j
Questions: #9 • How does performance vary with input xml filesize? • Fixed: • Measurement Machine • Processing Activity – Xstat • JVM – JDK 1.4.0 • JVM State – Steady • Variable: • Parser/Processor – XPP3, JDK1.4, JDOM beta8, DOM4J • Input Data – 100KB, 1MB, 10MB http://www.pankaj-k.net/xpb4j
Results: #9 http://www.pankaj-k.net/xpb4j
Questions: #10 • How does memory use vary with input xml filesize? • Fixed: • Measurement Machine • Processing Activity – Xstat • JVM – JDK 1.4.0 • JVM State – Steady • Variable: • Parser/Processor – XPP3, JDK1.4, JDOM beta8, DOM4J • Input Data – 100KB, 1MB, 10MB http://www.pankaj-k.net/xpb4j
Results: #10 http://www.pankaj-k.net/xpb4j
Questions: #11 • Any Interesting Observation? • Fixed: • Measurement Machine • Processing Activity – Xstat • JVM – JDK 1.4.0 • JVM State – Steady • Parser/Processor – JDOM beta8 • Input Data – 100KB, 1MB, 10MB • Variable • Node traversal loop – Loop1, Loop2 http://www.pankaj-k.net/xpb4j
Questions: #11 ( Contd. ) Loop1: … List children = elem.getChildren(); for (int i = 0; i < children.size(); i++) collectStat((Element)children.get(i), sc); Loop2: … ListIterator li = children.listIterator(); while (li.hasNext()) collectStat((Element)li.next(), sc); http://www.pankaj-k.net/xpb4j
Results: #11 http://www.pankaj-k.net/xpb4j
Caveats • Different APIs are not perfect substitutes • XSLT processors are significantly different from parsers • Performance should be only one criterion among many others • Xstat is an artificial processing and favors SAX/XmlPull API http://www.pankaj-k.net/xpb4j
What Next? • Comparison with C/C++ Parsers/Processors • Dynamic generation of input data • Framework improvements • Better Reporting and Presentation • More processing activities • Better tuning ?! http://www.pankaj-k.net/xpb4j
How can you benefit ( and contribute )? • Benefit from XPB4J • Gain insight from the report • Learn XML by playing with code • Validate your assumptions • Tune your parser/processor ( if you are an implementer ) • Contribute to XPB4J • Run it under your environment and share your results • Write processing code • Extend the framework • Discussion mailing list is: • xpb4j-users@lists.sourceforge.net http://www.pankaj-k.net/xpb4j
Q & A http://www.pankaj-k.net/xpb4j