Efficient XML Interchange: Alternative Data Model Integration

Efficient XML Interchange What is it?Why is it?How does it fit in?

What is Efficient XML Interchange? • Alternative Representation of XML Infoset • support full XML (Infoset) data model • not a subset • no really, not a subset! • Interchange Format • optimized for data exchange • transmission, storage, processing • can use Schema, conventional compression

Why? • Expand the Web • limited uptake of XML & friends in certain domains • performance is problem • noteworthy domains • mobile, embedded, scientific, … • Lesson From Binary XML Formats • real need, and real solutions • widely applicable, win-win • multiple formats cause segregation, limit adoption

Integration into XML Stack • Same Data Model • merely an alternative encoding • Open Issues • format, or encoding? • content negotiation? • schema knowledge vs content negotiation • modes, configurability (e.g. simple types)

WebAPI / EXI? • Impact on… • APIs • initalisation: encoding modes, schema info? • XMLHttpRequest • again: modes, schema info? • diversity of formats? • Are data models in sync? • HTML as XML? • REX • fragment support?

Efficient XML Interchange Format Basics

Efficient XML Interchange • Goal(s) • maintain XML (Infoset) data model • seamless integration into XML software stack • improve compaction AND processing • Observation: • ‘smallness’ has multiple benefits • e.g. energy consumption during transmission • allows XML deployment in new scenarios • Underlying Philosophy: • exploit a-priori knowledge of (likely) content

How does it work? • Exploit Knowledge, at Several Different Levels • XML knowledge • copious syntactic redundancy • Schema knowledge • schema describes content in detail • heuristics • e.g. (declared) elements >> processing instructions • e.g. repeated string elements • e.g. small numbers >> large numbers • Cooperation with Conventional Compression • heavily biased data stream as compressor input

EXI Base Format • Coding Grammars • ‚generic‘ grammar: describe full XML Infoset • arbitrary elements, PIs, comments, entity references, etc. • schema-derived grammar • describes a specific format • content-derived grammar • add rules depending on encountered elements • splice these together, at very fine granularity • allow anything, but know what is (currently) likely • likely content: more efficient encoding

SE(*), CH, ER, CM, PI SE(*)CHERCMPI Element StartTag AT(*)NS EE EE EXI Base FormatBuilt-in, Generic Element Grammar

SE(quantity) SE(price) SE(quantity) AT(color) SE(desc) SE(desc) EE SE(quantity) EXI Base FormatA Schema-Based Grammar • Element Content Model: • (optional) attribute “color” • (optional) element “desc” • (mandatory) elements quantity, price

quantity desc SE(*) CH ER EE CM PI EXI Base FormatMerged Generic & Schema Derived Grammar SE(quantity) SE(price) SE(quantity) SE(desc) SE(*), CH, ER, CM, PI SE(*), CH, ER, CM, PI SE(*), CH, ER, CM, PI EE EE

Other, Major EXI Features • Simple Type Values • optimized codecs • type assigment through grammar • generic text coding always available • string / value tables • Bit-Packed vs byte-aligned codec • biased input into “deflate” compression

Impact on the XML Stack • Questions • content negotiation, header • http integration? • what do you need? what would be a problem? • pre-shared schemas • which formats? samples? • (X)HTML? AJAX? • need ‘hooks’ in the specification? • options / variables • different schemas, different options?

Efficient XML Interchange: Alternative Data Model Integration

Efficient XML Interchange: Alternative Data Model Integration

Presentation Transcript

INTERCHANGE

Information Interchange

Business Data Interchange: XML

Efficient XML Interchange Capability for NETCONF draft-varga-netconf-exi-capability-00

Efficient Discovery of XML Data Redundancies

Efficient XML Interchange

Efficient XML Interchange

Efficient Keyword Search Over Virtual XML Views

Information Interchange based on XML

On the Path to Efficient XML Queries

The XML Localisation Interchange File Format

Efficient Incremental Validation of XML Documents

A Comparison of XML Interchange Formats for Business Process Management

Efficient Processing of Ordered XML Twig Pattern

Efficient XML Storage, Query, and Update

Interchange Design

Efficient Processing of XML Update Streams