70 likes | 81 Views
This project aims to improve the efficiency of processing continuous queries over endless data streams by introducing Eddies, a novel approach that reduces memory burden and shares common work between modules. The project also focuses on intra-query scheduling and introduces new operators to deal with continuous data streams.
E N D
Eddies for Continuous Queries Sam Madden CS286 Project S01
Motivation • Want many queries over continuous streams of data • Current Eddies • Thread per query • Scanner per source • Share common work between modules • Reduce memory burden • Intra-query scheduling • (Not focusing on joins – Need new operators to deal with endless streams)
Data Structures • One Eddy per Telegraph Instance • Only Source-module for each source (over all queries) • One Filter per Source field (over all queries) • Per-Source State • Source -> Reachable modules • Query -> Completion bitmask • Per-Tuple State • Output query mask • Per Query State: • Output queues • Aggregate information
Tuple Flow • Tuple Arrives • Tagged with source id • Routing policy chooses a filter to route to, based on modules reachable from source • Filter marks query state as “output” for tuples which don’t pass • Tuple output to queries which have completed, using source • If more filters to check, tuple re-inserted into eddy • Works for Joins Too (Somewhat Inefficiently?) • Extend reachability graph across joins • Project out unused sources when tuples are output
Combining Filters • Given a Filter F over some field S.a, with n predicates generalized to be over ranges [a,b] (plus not-equals) • Interval tree for >, >=, <, <= predicates, inserting from interval (a,], [a,], [- , b), or [- , b]. (O(log n)) • When a tuple arrives, find intervals which it itersects. (O(n)) • For = and , use a hash table • For , output all tuples except those in table • Saves routing, tuple parsing cost • Simplifies optimization space
Routing Policy • Random policy routes to each module with equal probability • Ticket policy: from Eddy paper • Route to modules with highest selectivity • Estimate selectivity based on ratio of in/out tuples • Use back-pressure to adjust delivery rates • Multi-query Ticket policy • Estimate selectivity based on ratio of (number of applied predicates /number of passed predicates) • Based on Shankar’s implementation: back pressure not applied properly
Preliminary Results • Simple, four query test: from s select s.index where s.a > 30 from s select s.index where s.b > 30 and s.a > 30 from s select s.index where s.c > 30 and s.b> 30 and s.a > 30 from s select s.index where s.d > 30 and s.c > 30 and s.b > 30 and s.a > 30 • Becomes five modules: one scanner and four filters