170 likes | 190 Views
Efficient XML Interchange. What is it? Why is it? How does it fit in?. What is Efficient XML Interchange?. Alternative Representation of XML Infoset support full XML (Infoset) data model not a subset no really, not a subset! Interchange Format optimized for data exchange
E N D
Efficient XML Interchange What is it?Why is it?How does it fit in?
What is Efficient XML Interchange? • Alternative Representation of XML Infoset • support full XML (Infoset) data model • not a subset • no really, not a subset! • Interchange Format • optimized for data exchange • transmission, storage, processing • can use Schema, conventional compression
Why? • Expand the Web • limited uptake of XML & friends in certain domains • performance is problem • noteworthy domains • mobile, embedded, scientific, … • Lesson From Binary XML Formats • real need, and real solutions • widely applicable, win-win • multiple formats cause segregation, limit adoption
Integration into XML Stack • Same Data Model • merely an alternative encoding • Open Issues • format, or encoding? • content negotiation? • schema knowledge vs content negotiation • modes, configurability (e.g. simple types)
WebAPI / EXI? • Impact on… • APIs • initalisation: encoding modes, schema info? • XMLHttpRequest • again: modes, schema info? • diversity of formats? • Are data models in sync? • HTML as XML? • REX • fragment support?
Efficient XML Interchange Format Basics
Efficient XML Interchange • Goal(s) • maintain XML (Infoset) data model • seamless integration into XML software stack • improve compaction AND processing • Observation: • ‘smallness’ has multiple benefits • e.g. energy consumption during transmission • allows XML deployment in new scenarios • Underlying Philosophy: • exploit a-priori knowledge of (likely) content
How does it work? • Exploit Knowledge, at Several Different Levels • XML knowledge • copious syntactic redundancy • Schema knowledge • schema describes content in detail • heuristics • e.g. (declared) elements >> processing instructions • e.g. repeated string elements • e.g. small numbers >> large numbers • Cooperation with Conventional Compression • heavily biased data stream as compressor input
EXI Base Format • Coding Grammars • ‚generic‘ grammar: describe full XML Infoset • arbitrary elements, PIs, comments, entity references, etc. • schema-derived grammar • describes a specific format • content-derived grammar • add rules depending on encountered elements • splice these together, at very fine granularity • allow anything, but know what is (currently) likely • likely content: more efficient encoding
SE(*), CH, ER, CM, PI SE(*)CHERCMPI Element StartTag AT(*)NS EE EE EXI Base FormatBuilt-in, Generic Element Grammar
SE(quantity) SE(price) SE(quantity) AT(color) SE(desc) SE(desc) EE SE(quantity) EXI Base FormatA Schema-Based Grammar • Element Content Model: • (optional) attribute “color” • (optional) element “desc” • (mandatory) elements quantity, price
quantity desc SE(*) CH ER EE CM PI EXI Base FormatMerged Generic & Schema Derived Grammar SE(quantity) SE(price) SE(quantity) SE(desc) SE(*), CH, ER, CM, PI SE(*), CH, ER, CM, PI SE(*), CH, ER, CM, PI EE EE
Other, Major EXI Features • Simple Type Values • optimized codecs • type assigment through grammar • generic text coding always available • string / value tables • Bit-Packed vs byte-aligned codec • biased input into “deflate” compression
Impact on the XML Stack • Questions • content negotiation, header • http integration? • what do you need? what would be a problem? • pre-shared schemas • which formats? samples? • (X)HTML? AJAX? • need ‘hooks’ in the specification? • options / variables • different schemas, different options?