240 likes | 345 Views
A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments. Xuebing Qing Carlisle Adams. Agenda. Why Compress? Criteria for Compression Algorithms Gzip and Bzip wbXML with/without Transcode ASN.1 Combinations wbXML + Zip ASN.1 + Zip
E N D
A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments Xuebing Qing Carlisle Adams
Agenda • Why Compress? • Criteria for Compression Algorithms • Gzip and Bzip • wbXML with/without Transcode • ASN.1 • Combinations • wbXML + Zip • ASN.1 + Zip • Recent XML Compression Proposals • Conclusions and Future Directions
Why Compress? • For high interoperability between domains, XML (XACML) is a good choice for policy representation • On-device Authorization Decision rendering, and simple policy deployment/updating, is also required. • XML is too verbose and heavy for many mobile devices: • Limited bandwidth • Limited CPU power, RAM • Limited battery, flash memory, etc.
Evaluating Compression Algorithms • Criterion 1: High Compression Ratio • Criterion 2: Low Processing Overhead • Criterion 3: No Semantic Ambiguity • “Nice to have”: 3rd Party API Support • We consider the most popular compression algorithms, as well as their combinations: • Gzip and Bzip • wbXML • ASN.1 • wbXML with transcode + Gzip or Bzip • ASN.1 + Gzip + Bzip • None of them introduce semantic ambiguity and all have good 3rd party API support. • The ideal algorithm: should achieve the highest compression rate while keeping decompression overhead at a minimum.
Experimental Setting • Written in Java, tested under JSDK 1.4.2 / Windows 2000 / 866 MHz CPU and 512 MB RAM • Runtime Memory Profiling: Eclipse Hyades Plug-in • Java APIs Used • wbXML: kXML 1.1 (Open Source) • ASN.1: Pure Java API by OSS Nokalva (Alpha Version – Trial version) • Gzip: The gzip implementation in JDK 1.4.2 • Bzip: Apache BZip2 Implementation • Test Cases: 9 XACML files (2KB ~ 1 MB) created from the XACML (version 1.1) Conformance Test Suite
Gzip and Bzip2: Compression Rate • Very good compression rate (especially when size > 70K) • Compression_rate gzipbetter than Compression_rate bzip2 when size <= 70K, while Compression_rate bzip2better than Compression_rate gzip when size > 70K • Bzip2 performs extremely well when size >= 250K. • Zip algorithm works better with large files, yet it still compresses small files (2K) to 1/3 of original size.
Gzip and Bzip2: Processing Overhead - Time • Only decompression time is considered, because the compression of XACML only happens on the server side when deploying policies. • Absolute decompression time is not enough to evaluate. • The wbXML-to-XML conversion mainly involves XML tag replacement and is not CPU intensive soit can be performed on a device(thusthe time of the conversion can be used as a reference to make a fairly realistic evaluation). • Gzip performs the best; BZip2 is similar to wbXML conversion • Considering that kXML 1.1 API has significant room for optimization, it appears that wbXML conversion may ultimately have a similar time overhead to Gzip and hence may be acceptable on a mobile device.
Gzip and Bzip2: Memory Overhead – Raw Data • Numbers in brackets are mem increment; numbers in red means memory in use decreases when file size increases – it is caused by garbage collection. • Memory overhead of wbXML-to-XML is used as a reference for the estimate. • Size memory= Size memory_in_use + Size memory_gced. So the memory used by File 8 is not 1,857,623 (memory in use), but 3,087,933 bytes that include garbaged collected memory in the process. • To analyze, we categorize memory as two parts: base runtime memory for the decompression API and program itself, and decompression memory for representation and computation of data at runtime. • Base memory is estimated by comparing the absolute memory size with that of wbXML-to-XML conversion. • Memory size increment factor is used to estimate decompression memory.
Gzip and Bzip2: Memory Overhead – Result • Memory size increment factor measures the memory increment caused by the data size increment, or memory increment / file size increment. • The bigger a memory size increment factor is, the more memory is used for data compression and the more frequent the garbage collection will be. • It is range of possible values instead of one fixed value • Result: Gzip has a very small footprint when decompressing XACML data – its processing memory overhead is reasonable and acceptable. • However, a zipped XACML has to be unpacked into XML and then processed. • The processing overhead of Gzip is OHgzip = OHGzip-decompression + OHxml-processing
wbXML: Overview • Part of the presentation logic in WAP • Uses a token dictionary, where each token (transcode) maps to a predefined string (mainly element tags and attribute tags). • wbXML without transcode: no explicit token dictionary specified (otherwise, wbXML with transcode). • Code segments used to generate transcode in kXML 1.1
wbXML: Compression Rate • wbXML with transcode reduces size to under 50% of the original, which is much better than wbXML without transcode. • Not comparable with Gzip, particularly when the file size is over 5 KB. • However, an XACML policy in wbXML can be processed directly by a wbXML parser without any decompression overhead. • We only discuss the processing overhead for wbXML with transcode.
wbXML: Analysis of Processing Overhead • There is no time and memory overhead for decompression. • However, it is impractical to measure and compare CPU time and memory used by evaluating an XACML policy in wbXML form and in XML form. • We do following analysis rather than experiments • Footprintwbxml_obj < Footprintxml_obj : since a wbXML file is 50% of its original XACML size, it is reasonable to assume that a wbXML object is approximately half of its XML counterpart. • Smaller runtime representation certainly enables faster processing, but need to consider the overhead of transcode-table lookup at runtime. • We can assume Processing_Timewbxml<= Processing_Timexml • Evaluating an XACML policy in wbXML is less battery intensive because its in-memory representation is much smaller than its XML counterpart. • Result: OHwbxml = x OHxml-processing where < 1; it is smaller than OHgzip = OHGzip-decompression + OHxml-processing
ASN.1: Schema Based XML Encoding • A schema-based binary encoding spec, X-694 “Mapping W3C XML Schema Definitions into ASN.1”, is under development. • The spec introduces ASN.1, a binary-and-schema-based language, into the XML world, which is XML-schema based. • With the specification, an XML document can be converted into ASN.1, which is then encoded with ASN.1’s binary encoding rules, such as PER, DER, CER, BER • Theoretically, ASN.1 with PER, the most compact encoding rule, can achieve the same level compression rate that Gzip does [4]. • However, Pure Java API by OSS Nokalva only offers a compression rate that is just a little bit better than wbXML, partially because the API is still in its Alpha stage – several hot fixes have been sent during the experiments in this research.
ASN.1 Encoding: Compression Rate • Slightly better than wbXML with transcode, but not comparable to Gzip. • The result is different from the one from Fast Web Services (FWS) [7]; this might be caused by the difference in APIs used and/or by the different characteristic between XACML files and the Web services XML files used in FWS.
ASN.1 Encoding: Analysis of Processing Overhead • No need to convert an ASN.1 encoded policy to XACML when processing, because ASN.1 is a schema language and supports similar operations as XML. • As with wbXML, we do analysis rather than experiments. • The analysis is similar with the one for wbXML. • Result: OHASN.1 = x OHxml-processing where < 1; it is smaller than OHgzip = OHGzip-decompression + OHxml-processing • According to Sun’s experimental results on FWS, could be as small as 0.1 in a Web services environment (although no such result has been achieved in our experiments).
Agenda • Why Compress? • Criteria for Compression Algorithms • Gzip and Bzip • wbXML with/without Transcode • ASN.1 • Combinations • wbXML + Zip • ASN.1 + Zip • Recent XML Compression Proposals • Conclusions and Future Directions
Combine wbXML or ASN.1 with Gzip • Gzip, wbXML and ASN.1 do not perform well enough to satisfy the criteria on their own. • Pure Gzip has more processing overhead than wbXML and ASN.1, while wbXML and ASN.1 do not compress as well as Gzip. • It makes sense to combine them: • wbXML with transcode + Gzip • ASN.1 with transcode + Gzip • Other combinations are not as good as the above (wbXML with transcode is better than wbXML without transcode, and Bzip2 consumes much more memory and CPU time than Gzip for decompression).
The Combinations: Compression Rate • Much better than pure ASN.1 and wbXML • Even better than pure Gzip • It is interesting that the overall compression rate of wbXML + Gzip for XACML over 100KB is better than ASN.1 + Gzip.
The Combinations: Analysis of Processing Overhead • For wbXML with transcode + Gzip: OHwbxml_GZip = OHGzip_decompression + x OHxml-processing • For ASN.1 + Gzip: OHASN.1_Gzip = OHGzip_decompression + x OHxml-processing • Just for reference: • Gzip: OHgzip = OHGzip-decompression + OHxml-processing • wbXML: OHwbxml = x OHxml-processing • OHwbxml_Gzip is definitely better than OHGzip because an XACML file is only decompressed once but processed many times. • Although OHwbxml_Gzip is greater than OHwbxml, the difference can be ignored, because OHGzip_decompression is small and the decompression only happens the first time the policy is downloaded, and when the policy is updated. • Conclusion: wbXML + Gzipis better than ASN.1 + Gzip: • Tag names in XACML are long; simple replacement (wbXML) achieves a good compression rate. • Replacement (wbXML) creates less overhead than complex encoding (ASN.1) • ASN.1 does not achieve the excellent compression rate expected (when publicly available APIs are used). • Good open source wbXML APIs are available.
Recent XML Compression Proposals (1): XOP/MTOM • XOP: XML-binary Optimized Packaging • an XML serialization protocol, which converts certain XML data content (usually base-64 encoded) into binary streams and puts them into a structure that looks like MIME multipart, with an XML document as the root part. • MTOM: Message Transmission Optimization Mechanism • a description of how XOP is layered into SOAP HTTP transport (SOAP 1.2) for Web services • More HTTP friendly (it’s using MIME multipart); not originally conceived for the wireless world. • More like a communication protocol than a compression algorithm. • There appears to be no public implementation available; therefore, not known how well it performs with respect to our criteria (compression rate, processing overhead, semantic ambiguity)
Recent XML Compression Proposals (2): XMill • A compression algorithm from AT&T, particularly designed for XML • Step 1 - Regrouping: separate structure, layout, and data, then distribute data elements into data streams (int, char, string, base64, etc.) • Step 2 – Use gzip, bzip2, etc., to compress these streams • XMill typically achieves much better compression rate than conventional compressors such as gzip, bzip2 on XML data. • More processing overhead than gzip, bzip2 for the extra “step 1”. • Compared with wbXML + Gzip, XMill needs to convert XACML back to XML for processing.
Conclusions and Future Directions • Suggested criteria for the use of XML-based policies in mobile devices • Reviewed and compared a variety of compression algorithms for XML • Concluded that {wbXML + transcode + Gzip} offers the best combination of compression rate and processing overhead of all algorithms tested • This combination is recommended for use with XML-based security policies in mobile computing environments • Directions for further work • Keep an eye on ASN.1 (will public implementations match theoretical results?) • The compression rate of wbXML with transcode can be improved by adding more transcodes into the table (e.g., built-in function names, data type names, etc.). How much improvement can be gained? • Experiments on XMill (perform more detailed comparison with wbXML to determine the best algorithm for this environment)
References • [1]Uche Ogbuji. “Tip: Compress XML files for efficient transmission”, IBM DeveloperWorks, 9 April, 2004 • [2] M. Cokus, D, Winkowski. “XML Sizing and Compression Study For Military Wireless Data”, XML 2002 Proceedings by deepX • [3] http://www.wapforum.org/what/technical/PROP-WBXML-19990815.pdf. “WAP Binary XML Content Format Specifications – Version 1.2” • [4] ASN.1 Site - XML. “What ASN.1 Can Offer for XML?”, http://asn1.elibel.tm.fr/xml/ June, 2004 • [5]ITU-T X.694. “Information Technology – ASN.1 encoding rules – Mapping W3C XML Schema Definitions Into ASN.1”, Jan, 2004 • [6]Nokia. “Nokia Position Paper: W3C Workshop on Binary Interchange of XML Information Item Sets”, Aug, 2003, http://www.w3.org/2003/08/binary-interchange-workshop/02-Nokia-Position-Paper_02.htm • [7] P. Sandoz, et al. Sun Microsystem. “Fast Web Services”, July, 2003, W3C Workshop on Binary Interchange of XML Information Item Sets • [8] http://www.devx.com/xml/article/16754/0/page/1 “Compressing XML” • [9] M. Girardot, N. Sundaresan. “Millau, an encoding format for efficient representation and exchange of XML over the Web”, http://www9.org/w9cdrom/154/154.html • [10] http://www.gnu.org/software/gzip/gzip.html. “gzip - GNU Project - Free Software Foundation(FSF)” • [11] http://gnuwin32.sourceforge.net/packages/bzip2.htm “Bzip2 for Windows” • [12] http://www.kxml.org “kXML with wbXML support” • [13] http://www.oss.com “OSS Nokalva ASN.1/Pure Java Tools - Beta” • [14] http://www.eclipse.org/hyades/ “Hyades – Automated Software Quality Evaluation Framework” • [15] http://sourceforge.net/projects/xmill “XMill - A User Configurable XML Processor”