330 likes | 525 Views
CS662 Paper Presentation. Query Processing for Sensor Networks. Yong Yao Johannes Gehrke. Jie Li Nov. 20, 2008. Background. Developments in hardware have enabled the widespread deployment of sensor networks
E N D
CS662 Paper Presentation Query Processing for Sensor Networks Yong Yao Johannes Gehrke Jie Li Nov. 20, 2008
Background Developments in hardware have enabled the widespread deployment of sensor networks Sensor networks promise plentiful of future applications, most of which are naturally data-driven
Sensor Networks A sensor network consists of a large number of sensor nodes A sensor node has one or more attached sensors which produce data to be processed Declarative queries are preferred for data interaction in sensor networks
Constraints of Sensor Networks Communication Power consumption Computation Uncertainty in sensor readings
Why need a query layer? To support declarative query processing Limited resources require highly efficient data management Can help generate query plans with different tradeoffs for different users
Declarative Query Example • Support long-running, periodic queries: • DURATION: specifies the life time of a query • EVERY: determines the rate of query submission SELECT AVG(R.concentration) FROM ChemicalSensor R WHERE R.loc IN region HAVING AVG(R.concentration) > T DURATION (now, now + 3600) EVERY 10
Structure of a Query Plan • Decides how much computation • is pushed into the network • Specifies the role of each sensor • node, as well as the coordination • among them … • Consists a collection of data • from a set of sensors • Tasks coordinated by the • leader node of the flow block Coordinate • Computes the aggregate at the • leader node, or computes partial • aggregates at intermediate nodes • Sets up suitable communication • routes for delivery of sensor • records within the network
Structure of a Query Plan … Coordinate
Simple Aggregate Query Processing • Each source sensor sends • data to the leader • Computation only happens • at the leader • Each single record packet • usually small in size • Merge several packets into • a larger packet • Compute partial results in • intermediate nodes • Partial results are then • forwarded to the leader • In-Network Aggregation • Basic Idea: Compute partial aggregates at intermediate nodes, rather than compute all sources in a single destination node (leader node) • Synchronization required • Mechanisms for simple aggregate • Direct delivery • Packet merging • Partial aggregation
Synchronization • Required for packet merging and partial aggregation • Task: Determine how many sensor readings to wait for in each query round • Communication Structure: • Spanning Tree (duplicate sensitive aggregates) • DAG (Not duplicate sensitive aggregates) • Simple Algorithm: Incremental Time Slot • Large cost
Synchronization (Cont.) • A more pragmatic approach • When a parent node receives a record from a child node, it adds the child node to its waiting list for the next round (Prediction) • The parent node sets a timer to recover from false prediction (when actually, the child node doesn’t send a record to the parent in the next round) • The child can also generate a notification packet to its parent about a false prediction
Structure of a Query Plan … Coordinate
Complex Query Optimization • Extension to GROUP BY and HAVING Clauses (Q1) SELECT D.gid, AVG(D.value) FROM SensorData D GROUP BY D.gid HAVING AVG(D.value) > Threshold Two alternative plans: • Create a flow block for each group • Create a flow block that is shared by multiple groups
Complex Query Optimization (Cont.) • Joins (Q2) SELECT oid FROM SnesorData D1, SensorData D2 WHERE D1.loc IN R1 AND D2.loc IN R2 AND D1.oid = D2.oid • The join operation can reduce or increase the resulting data size (depending on the selectivity) • Increased Join: More expensive to compute at the leader node. Vice versa.
Structure of a Query Plan … Coordinate
Wireless Routing Protocols • Main tasks of a routing protocol • Route discovery • Route maintenance • Distributed and adaptive routing protocol • Proactive (e.g. DSDV) • Reactive (e.g. AODV) • Hybrid (ZRP) • Ad hoc On-demand Distance Vector • Scale to large-size networks • Does not generate duplicate data packets
Extensions to the Network Interface • Packet merging and Partial aggregation require internal nodes to intercept data packets • Not supported by traditional “send and receive” interfaces of the network layer • This capability is provided by the use of filters • The network layer will first pass a package through a set of registered functions that can modify the packet
Structure of a Query Plan … Coordinate
Crash Recovery (Route Management) • 2 main enhancements to AODV protocol • Route Initialization • The leader of the aggregation broadcasts a route initialization message to create all the routes • Route Maintenance • Local repair • Bunch repair
Structure of a Query Plan … Coordinate
Experimental Evaluation • A prototype of the query processing layer tested in the ns-2 network simulator • Prototype Characteristics: • High degree of precision, including collisions at the MAC layer, detailed energy models, etc. • Communication range of each sensor: 50m • Assuming bi-directional links • Receive power dissipation: 395mW • Transmit power dissipation: 660mW • Sensor readings size: 30 bytes per tuple
Experimental Evaluation (Cont.) Average Dissipated Energy vs. Network Size Average Delay vs. Network Size • Simple Aggregate Query • Computes the average value over all sensors • Assuming a fixed density of sensor nodes
Experimental Evaluation (Cont.) Improved Local Repair Algorithm Effect of Bunch Repair Result Accuracy Routing
Experimental Evaluation (Cont.) SELECT D.gid, AVG(D.value) FROM SensorData D GROUP BY D.gid HAVING AVG(D.value) > Threshold Plan 1: Creates one big flow block to be shared by all groups Plan 2: Creates a separate flow block for each group in aggregation Impact of Sensor Distributions Distributed Topology Overlap Topology Query Plans (Q1)
Experimental Evaluation (Cont.) • SELECT oid • FROM SnesorData D1, SensorData D2 • WHERE D1.loc IN R1 AND D2.loc IN R2 AND D1.oid = D2.oid Plan 1: Sensors send all tuples back to the gateway Plan 2: Creates a flow block for the Join operator inside the query region Join Query Query Plans (Q2)
Experimental Evaluation (Cont.) SELECT AVG(value) FROM Sensor D WHERE D.loc IN [(400,400), (500,500)] HAVING AVG(value) > t Plan 1: Uses an existing flow block which covers the whole network Plan 2: Creates a new flow block for aggregation inside the query region Aggregate Query Query Plans (Q3)
Weaknesses Prototype tested in simulation, not in real implementation Some important mechanisms (such as synchronization) are based on bidirectional links, which the authors themselves claim is not common in practice Some discussions are rather preliminary and without quantitative, in-depth analysis (such as query optimization )
Conclusion Data management remains challenging in resource-constraint sensor networks Query processing is very common in data-driven sensor network applications Higher power-efficiency can be achieved through in-network aggregation, enhancement on the routing layer, and query optimization, etc.