80 likes | 240 Views
Engine Design: Stream Operators Everywhere. Theodore Johnson AT&T Labs – Research johnsont@research.att.com. Contributors: Chuck Cranor Vladislav Shkapenyuk Oliver Spatscheck. Early Data Reduction. Goal : Query high-speed links using inexpensive off-the-shelf servers.
E N D
Engine Design:Stream Operators Everywhere Theodore Johnson AT&T Labs – Research johnsont@research.att.com Contributors: Chuck Cranor Vladislav Shkapenyuk Oliver Spatscheck
Early Data Reduction • Goal : Query high-speed links using inexpensive off-the-shelf servers. • OC48 : 2 x 2.4 Gb/sec., 7 million packets/sec. • OC192 : 2 x 7.2 Gb/sec., 21 million packets/sec. • Goal : Evaluate queries over every bit of every packet. • Problem : Not enough cycles in a second. • 3 Ghz / 21 Mpacket/sec = 142 cycles / packet • Solution : Push data reduction operators as far down the protocol stack as possible. • Into the hardware if possible. • View hardware bit twiddling as stream operators.
Early Data Reduction in Gigascope • Gigascope was designed to monitor very high speed (optical) links using complex query sets. • Multiple levels of data reduction: • Data reduction in the NIC : depends on NIC capabilities • Snap length (projection) • BPF filters • Approximate filtering (bitmasks) • Data reduction queries (replace the NIC run time system) • Low level queries • Run queries on kernel input buffers • Preliminary filter for the query set • Other possibilities ….
Network Interface card Example: Router Monitoring High Level Queries • Selection/projection/aggregation • Pre-filter Low Level Queries Kernel Libpcap / BPF filters Circular Buffer Router • Snap length (projection) • Approximate filter (selection) • Selection/projection/aggregation queries (replace run time system) Select Stream Network Tap
Stream Operators • Problem : Great heterogeneity in the specifics of manipulating the hardware mechanism • Stream selection vs. NIC filters vs. kernel filters, etc. • Programmable NIC vs. bit-twiddling NIC vs. non-programmable NIC, etc. • Solution : • Define a set of stream operators to evaluate the stream query. • Selection, projection, (partial) aggregation • Merge, join, reorder ? • Define hardware capabilities as the types of queries they can execute • Multiple query optimization over the query set • Low level query nodes feed multiple user queries
Example (network monitoring) selecttimestamp, sourceIP, destIP, source_port, dest_port, len, total_length, gp_header from GAMEPROTOCOL wheresample_hash[50, sourceIP, destIP] and protocol=17 and offset=0 • NIC : snap_len = 134 (projection) • Pre-filter : protocol=17 and offset=0 • Low-level query : selecttimestamp, sourceIP, destIP, source_port, dest_port, len, total_length, gp_header from GAMEPROTOCOL wheresample_hash[50, sourceIP, destIP] and protocol=17 and offset=0
Other Operators? ordered stream • Merge : Some NICs deliver packets out of order … • Optical links are not duplex Almost ordered stream Stream Merge In Buffer Out Buffer In Buffer Out Buffer NIC NIC timestamp timestamp
Summary • Early data reduction is critical for monitoring very high-speed streams • Selection, projection, aggregation. • Use stream operators to mask the complexity and heterogenity of hardware / kernel data reduction. • Issues : • Multiple query optimization • Push more complex operators down the stack? • Join? Stratified sampling? Sketches? • Optimization at low level / hardware level • Approximate filters • Avoid duplicate filters. Where to place them? • Re-organization when the query set changes.