Processing Continuous Join Queries in Sensor Networks: A Filtering Approach by Mirco Stern, Klemens Böhm, Erik Buchmann

Processing Continuous Join Queries in Sensor Networks: A Filtering Approach by Mirco Stern, Klemens Böhm, Erik Buchmann SIGMOD Conference 2010: 267-278 Presenter: Bryan Guthrie

Wireless Sensor Networks • Consist of battery-operated nodes equipped with sensors • Constrained communication/computation capabilities

WSN Query Processing • Abstract the network into a relation, with one tuple per node and attributes representing the sensors of the node • Previous work: processing selections and projections well understood, but less attention to joins (until recently)

Continuous queries • Reports current sensor readings periodically • Joins allow us to combine data from different nodes • Applications: monitoring and surveillance

Example query • Acquire data from nodes observing similar temperature and humidity conditions

Goal: Minimize Energy Costs • Sensing and communication costs dominate other areas – thus, goal is to minimize communications • IDEAL: Each node discards non-joining tuples, then sends remaining tuples to base station (where computation occurs) • Infeasible because each node would need to know if its tuple joins - expensive!

Prior work • Precompute the set of tuples that join • Not optimal for continuous queries, because of the cost of updating this set prior to each execution

Continuous Join Filtering • Maintain filters at nodes • Discard tuples whose attribute value is within filter interval, send the rest • Filter size needs to be optimized for efficiency

Maintaining filters • Base station continuously computes filters that minimize communication costs • For each execution, the base station decides which filters to update • Updates require sending them to nodes, so small changes may not pay off

Some filter definitions • A filter is a multidimensional interval [ai, bi] • Node j's filter is filterj • If the attribute values of node j are within the interval of filterj in all dimensions of the filter, then j does not send its data

Ensuring correctness • If node j has filtered its tuple tj, note that this means tj must be within filterj • Base station can check if any values in filterj would join with data from other nodes, and if so retrieves tj from node j (ensuring correctness)

Example • Say node j's filter is [22ºC, 23ºC] • Node h sends a tuple with temperature value 24ºC → cannot join with j's tuple • Node h sends a tuple with temperature value 23.1ºC → could join with j's tuple, so we need to retrieve it

Filter size • How big should filters be? • Not so small that unneeded tuples sent, not so large that needed tuples aren't sent • Avoid collisions between filters, which happen when some of the values in the filters join • Optimal filter size is not uniform across nodes • Smaller filters if there are more potential join partners for a node

Optimizing filters • Goal: minimize communication costs for next query execution • Therefore, we want to find the filter size that minimizes communications • If there are several minima, pick the one that has the smallest filter size (less risk of collisions) • Continuously updated based on previous filter size and projected sensor readings

Predicting measurements • CJF is not tied to any particular model for predicting measurements • For evaluation purposes, the authors used a linear regression model • Known to work well with sensor data sets • No need to fit model to data • Low maintenance costs

Updating filters • Redistributing new filter sizes every execution will cost more than it saves • Therefore, only send updates if the expected savings outweigh update costs • This can't be done in isolation; whether filterj is updated or not affects all nodes depending on filterj

Example • filterj = [22ºC, 23ºC], filterh = [23.5ºC, 23.9ºC] • filterj wants to shrink to [22ºC, 22.7ºC], filterh wants to grow to [23.1ºC, 23.9ºC] • Assuming the join condition |A.temp – B.temp| < 0.3ºCh can't update unless j updates because the old filterj and new filterh can contain joining tuples • j is a blocking node, h is a dependent node

Updating filters • Must be done in following order: • Resolve filter collisions (because these double communications costs, since the base station must retrieve the colliding tuples) • Shrinking filters – but only if the cost of a suboptimal filter + the cost of suboptimal filters on blocked nodes is greater than the update cost • Enlarging filters – if the cost of a suboptimal filter is greater than the update cost

Evaluation • Used publicly available LUCE data set (environmental sensors) • Compared against 5 alternatives • External joins • SENS-Join (uses precomputation) • IDEAL • Adaptive Precision Setting – like CJF but does not directly account for dependencies • UNIFORM (all filters are the same size)

Eval. – Communications Needed • CJF outperforms other methods and is closest to IDEAL

Evaluation - Dependencies • Considering dependencies reduces collisions, and thus reduces communications

Eval. – Individual optimization • Using individual filter sizes is better than uniform sizes in all cases

Conclusions • Continuous Join Filtering minimizes communications (and therefore energy) costs compared to other models • CJF is closest to optimal for continuous queries • Considering dependencies improves performance

Processing Continuous Join Queries in Sensor Networks: A Filtering Approach by Mirco Stern, Klemens Böhm, Erik Buchmann