310 likes | 428 Views
Operator Placement for In-Network Stream Query Processing. Problem Setup. Query. Results. increasing computational power. increasing bandwidth. Data Acquisition. Examples. High Fan-In Environment HiFi Project – Berkeley Supply Chain Management Network Monitoring
E N D
Problem Setup Query Results increasing computational power increasing bandwidth Data Acquisition PODS '05
Examples • High Fan-In Environment • HiFi Project – Berkeley • Supply Chain Management • Network Monitoring • Limited processing at routers PODS '05
Query Processing • Simple approach • Transmit all data to root • Process query at root • Often highly wasteful of network bandwidth • Aggregate queries • Filter queries • Low-selectivity joins • In-Network Query Processing Push some query processing tasks to nodes lower in the hierarchy PODS '05
Computation – Communication Tradeoff • Pushing query processing to lower nodes Reduces network transmission More computation on weaker devices • When computation cost << communication cost • Push computation as far down as possible • E.g., aggregation, cheap filters • What about expensive operators? PODS '05
Example Scenario Video Surveillance • Sensors gather images • User looking for “suspicious” images, e.g., • Monitored area is dimly lit (filter F1) • Lots of motion between successive frames (filter F2) • F1 relatively cheap, F2 expensive PODS '05
Video Surveillance (Contd.) F1Æ F2(I) F1Æ F2 Image Acquisition (I) PODS '05
Video Surveillance (Contd.) F1Æ F2(I) F2 F1(I) F1 Image Acquisition (I) PODS '05
Problem Statement S – stream of data acquired at leaf nodes Conjunctive Filters Query: SELECT * FROM S WHERE F1 Æ F2 Æ … Æ Fn Choice: Decide which level to execute each filter (No cost constraints at nodes) Objective: Minimize total cost of computation and communication. PODS '05
Outline • Notation and Cost Model • Optimal Algorithm • Extension to Handle Joins • Conclusions and Future Work PODS '05
Conversion to Linear Topology PODS '05
In-Network Query Plan Results In-Network Query Plan: Mapping of each F1, F2, …, Fn to one of N1, N2, …, Nm Nm Total mn plans possible N2 Choose the one with the least cost N1 Data Acquistion (Stream) PODS '05
Cost Model 1. Filter Selectivity • Filter F has selectivity s(F) • Expected fraction of tuples that satisfy F • For this talk assume filters are independent. • Technique extends to correlated filters; see paper. PODS '05
Cost Model 2. Filter Cost Results • Filter F has per-tuple execution cost c(F,i) on node Ni • Cost scales down by factor i on moving from Ni to Ni+1 ) c(F, i+1) = i¢ c(F, i) c(F,m) Nm cost scaledown m-1 N2 c(F,2) cost scaledown 1 N1 c(F,1) Data Acquistion (Stream) PODS '05
Cost Model 3. Network Transmission Cost Results Per-tuple transmission cost from Ni to Ni+1 is Li Assume Li scaled to be comparable to processing cost. Nm transmission cost Lm-1 N2 cost L1 transmission N1 Data Acquistion (Stream) PODS '05
Centralized Filter Ordering [HS93] • Optimal way to execute a set of filters at one node is in increasing order of rank. • Assume filters at any node are executed in rank order. PODS '05
Example Cost Calculation selectivity = 1/2 N2| F3 1300 1/5 ¢ 1/4 ¢ 1= 1/5 700 1/4 ¢ L1= 700 . N1| F1, F2 + 200 1/2 400 Stream Total = 640 Want to minimize total cost; no cost constraints at nodes PODS '05
Outline • Notation and Cost Model • Optimal Algorithm • Extension to Handle Joins • Conclusions and Future Work PODS '05
Optimal Algorithm Idea: Set up equivalence between Fi = Set of filters executed at Ni ±: Concatenation Lemma: c(P) = c( r(F1) ± Fn,1 ± r(F2) ± … ± Fn,m-1± r(Fm), 1) Cost of In-Network Query Plan P Cost of Executing a Sequence of Filters Model network link from Ni to Ni+1 as filter Fn,is.t. s(Fn,i) = i c(Fn,i , i) = Li PODS '05
Optimal Algorithm (Contd.) c(P) = c( r(F1) ± Fn,1 ± r(F2) ± … ± Fn,m-1± r(Fm), 1) Suppose rank (Fn,i-1) < rank (Fn,i)8 i Choose Fi = { F | rank (Fn,i-1) < rank (F) < rank (Fn,i) } Theorem: The above choice of Fi is optimal (minimizes c(P) ). Proof: c(P) = c( r(F1) ± Fn,1 ± r(F2) ± … ± Fn,m-1± r(Fm), 1) c(P’) = c( r(F1’) ± Fn,1 ± r(F2’) ± … ± Fn,m-1± r(Fm’), 1) Rank- ordered (Optimal) PODS '05
Short-Circuiting What ifrank (Fn,i-1) > rank (Fn,i)? )Fi = • Case 1: rank(F ) < rank(Fn,i-1) • Case 2: rank(F ) > rank(Fn,i-1) ) rank(F ) > rank(Fn,i) Fn,i Fn,i-1 Ni | F F Fi ) Fi = PODS '05
Short Circuiting Fi = { F | rank (Fn,i-1) < rank (F) < rank (Fn,i) } Ni+1 Ni+1 i Li i-1i Ni Li-1+Li Ni Li-1 i-1 Ni-1 Ni-1 Continue short-circuiting until rank (Fn,i-1) < rank (Fn,i)8 i Complexity: O( n+m log(n+m) ) PODS '05
Example selectivity = ½ F c(F,1) rank N4 F1 200 400 3= 1/4 L3 = 300 F2 400 800 N3 F3 1300 2600 L2 = 500 F4 2500 5000 2= 1/2 N2 1= 1/5 L1= 700 N1 Stream PODS '05
Example selectivity = ½ F c(F,1) rank N4 F1 200 400 3= 1/4 L3 = 300 F2 400 800 N3 F3 1300 2600 L2 = 500 F4 2500 5000 2= 1/2 N2 1= 1/5 L1= 700 N1 Stream PODS '05
Example selectivity = ½ F c(F,1) rank N4 F1 200 400 3= 1/4 L3 = 300 F2 400 800 N3 F3 1300 2600 L2 = 500 F4 2500 5000 2= 1/2 N2 1= 1/5 L1= 700 N1 Stream PODS '05
Example selectivity = ½ N3 F c(F,1) rank N4 | F4 F1 200 400 3= 1/4 L3 = 300 F2 400 800 L2 = 800 2= 1/8 N3 F3 1300 2600 L2 = 500 F4 2500 5000 2= 1/2 N2 | F3 1= 1/5 F s(F) c(F,1) rank L1= 700 Fn,1 1/5 700 875 N1 | F1, F2 Fn,2 1/2 2500 5000 Fn,2 1/8 4000 4571 Fn,3 1/4 3000 4000 Stream PODS '05
Outline • Notation and Cost Model • Optimal Algorithm • Extension to Handle Joins • Conclusions and Future Work PODS '05
Extension to Handle Joins Results Nm WHERE F(S1) AND G(S2) ri : rate of stream Si Rate of output of join = f¢ r1r2 Cost of join = ar1+br2+cr1r2 N2 Decide placement of join operator and filters N1 Stream S1 Stream S2 Theorem: Filters in F (and G) must be executed in rank order PODS '05
Extension to Handle Joins Algorithm • Guess • Position of join in r(F) and r(G) • Node at which join shall be executed • For each guess, place filters optimally; calculate cost • Choose the least cost plan Complexity ¢ (|F|+|G|+m) log(|F|+|G|+m) ) O( |F| |G| m Future Work: Extension to join trees PODS '05
Related Work • Distributed Stream Processing • Borealis, HourGlass, IrisNet, NiagaraCQ • In-Network Processing for Aggregate Queries • Acquisitional Query Processing • Expensive Filters in Relational Database Queries PODS '05
Conclusions • In-Network Query Processing for environments where • Data collected at low-power nodes • Transmitted through a hierarchy of nodes with progressively increasing power and bandwidth • Optimal Operator Placement for • Independent Filters • Correlated Filters (4-approximation) • Multiway joins • Considered simplest model; some interesting variations • Cost constrained nodes (NP hard) • Load balancing across nodes (NP hard) • Non-uniform changes in filter costs PODS '05