High-Performance Content-Based Event Routing with XML Data Binding

XML Data Binding:Encoding for High-Performance Content-Based Event Routing Gail Kaiser Phil Gross Columbia University Programming Systems Lab

Overview • PSL Intro • MEET Project • Encoding Conversion Efficiency • Encoding Size Efficiency • Encoding Classification Efficiency

Programming Systems Lab • “PSL conducts research on Web technologies, collaborative work, virtual worlds, process/workflow, extended transaction models, software development environments and tools, software engineering, information management, and distributed programming systems” • Lately, lots of XML stuff

PSL XML-related Research • FlexML: Flexible XML • Open-ended XML streams that may include “new” tags • Dynamic schema and semantics discovery and composition • XUES: XML-based Universal Event Service • Event Packager: Data mining over XML structured data • Event Distiller: XML event poset pattern matching • Learning new application-domain events to recognize • DISCUS: Decentralized Information Spaces for Composition and Unification of Services • Rapid and secure application composition using Web Services • Trust Evolution: PGP Trust + KeyNote + real-world business

MEET • Multiply Extensible Event Transport • Content-based multicast routing • Must be efficient enough for embedded and high-performance applications

MEET Motivations • Personal Life Recorder (sensor oriented) • GroupWork Recorder (computer/DB oriented) • Parallel/Grid computing • Distributed simulation • Battlefield C4I • Last, but not least: • Dissertation submission

Machine A Relational Machine B XML Relationship to Other Work • Generally modeling communication like • What actually goes over the line is afterthought • But with N-Way Internet-scale communication • Millions of publishers and subscribers • We can (must!) do better than ASCII text… • Line speed => ≈250 assembly instructions per packet

MEET Extensibility • Want to scale up, to millions of pubs and subs • Want to scale down, to embedded and wireless • No single solution satisfactory at all scales • Composed of hot-swappable subsystems • Router, transports, clock/causality, types, etc.

Why Types • Event data is not just an opaque bag of bits • Subscriptions are Boolean functions over events • Type safety would be nice • What type system to use?

Initial MEET Type Design • Initial design calls for supporting Java, C#, and XML Schema defined objects “out of the box” • XML Schema used as Ur-language/Esperanto for conversions • Subscriptions are arbitrary boolean functions on datatypes • XML Schema is not ideal ur-type • Excessively complex, verbose, etc.

Encodings for Efficiency • Java, C#, XML, ASN.1 have well-defined but proprietary encodings for instances • Would be nice to have an independent encoding scheme with some desirable properties missing from the above • Fast serialization/deserialization • Elimination of redundant information from message sequences • Data organized for rapid classification/routing

Conversion Efficiency • Need to get to and from wire format as fast as possible • Leverage homogeneity to eliminate unnecessary conversions, e.g., network byte order • ECho system from Eisenhauer et. al., Georgia Tech • Using “native data” for ultra-low latency • Necessary for HPC

Size Efficiency • Ideal for single message is self-describing data • With multiple messages of same type, one can pull out redundant type info, e.g., schema • Goal is to go further: If 90% of content of messages is the same, generate a new subtype with fixed values • From self-describing to all-schema is a continuum

Classification Efficiency • When bits start arriving serially at the router, would like to begin cut-through routing as soon as possible • Avoid the curse of IP/IPv6: source address first • Want key routing bits as close to the front as possible • Want data in fixed locations

Fast Classifying: First Things First • In the packet, type info first (after magic) • Would like to represent type codes as bit string with “most significant” info e.g. parent type first, followed by subtype identifier, sub-subtype, etc. • Need access to type hierarchy • Popular classification fields at the front • Need to tag with popularity metadata • “subscribers will want to select on me”

Fast Classifying: Fixed Positions • Would like to avoid scanning through long or variable-length fields • Long/Variable data needs to be in a separate channel/section • Primitives and fixed-length references at the front • References point into data section • Classifier can jump large, uninteresting data quickly

Plus: Schema Format • We’d like the schema format to be amenable to programmatic manipulation and analysis • For instance, when negotiating formats, we’d like to be able to compute how our original format offer differs from the counter-offer • XML Schema is pretty good for this

Conclusions • Efficient instance transfer is an interesting case for data-binding • Special needs for efficiency • But we can negotiate our own format among the communicating parties • Some explicit support for this in a general data-binding solution could help acceptance

High-Performance Content-Based Event Routing with XML Data Binding

High-Performance Content-Based Event Routing with XML Data Binding

Presentation Transcript

Routing in Mobile Ad hoc Networks

More Visual Studio Data Binding and Web Services

Chapter 16 - Dynamic HTML: Data Binding with Tabular Data Control

Content-Based Routing: Different Plans for Different Data

Routing

High content screening workflow

WinJS Data Binding

ASP.NET Data Binding

Dynamic Binding

Utility-based Routing

ALICE EDM

A ROOT-Based Client-Server Event Display for ZEUS

Data-centric Networking Through Adaptive Content-based Routing

Security and Routing

Routing Economics under Big Data

Rendezvous-Based Directional Routing: A Performance Analysis

Linux Kernel Support for Ad-hoc Routing

Event Routing

William Stallings Data and Computer Communications

MPEG Encoding

Work supported by EU RP6 project JRA1 FutureDAQ RII3-CT-2004-506078

Routing and Routing Protocols