250 likes | 404 Views
Aggregate Query Processing in Ad-Hoc Sensor Networks. Yong Yao Database lunch, Apr. 15th. Outline. Motivating Example Sensor and Sensor Network Query Model In-Network Aggregates Routing and Aggregation Summary. Motivating Example.
E N D
Aggregate Query Processing in Ad-Hoc Sensor Networks Yong Yao Database lunch, Apr. 15th
Outline • Motivating Example • Sensor and Sensor Network • Query Model • In-Network Aggregates • Routing and Aggregation • Summary
Motivating Example • A several-hundred node ad-hoc network of sensors(Cougar) is deployed in Rhodes Hall and Upson Hall • The network is shared by all occupants • The network is dynamic, and people can add and remove sensors, and sensors frequently run out of power or crash
Motivating Example • People extract information from the environment by querying the network • What is the temperature of my office? • How many people are in the system lab? • What’s the quietest conference room? • Where is Johannes?
Next generation sensors • Data source: Sensors respond to physical stimulus (heat, light, or a motion) and produce events • Computation Ability: Sensors are active, full fledged computers • Communication Ability: Wireless connected, broadcast channel, self organized into a multi-hop network topology. • Limitation: Energy constrained and easy to crash.
Today’s Hardware - Motes • Assembled from off-the-shelf components • 4Mhz, 8bit MCU (ATMEL) • 512 bytes RAM, 8K ROM • 900Mhz Radio (RF Monolithics) • 10-100 ft. range • Temperature Sensor & Light Sensor • LED outputs • Serial Port 1.5” x 1.5”
Sensor Network • Consist of a bunch of sensors, and gateway nodes(sinks). • As an ad-hoc network • Static or quasi-static • Dynamically changing • Large scale • As a distributed database system with in-network query processing
SELECT {agg(attr),attrs} FROM sensors WHERE {spatial constraint} GROUP BY {attrs} HAVING {havingPreds} DURATION {time} EVERY {period} Example: What is the temperature of my office Select AVG(temperature) From TemperatureSensor s Where s in MY_OFFICE Duration 1h Every 10s Query Model Open Problem: What’s the best model of general queries.
Aggregate Operator • Agg is implemented via three functions • Merging functionf: • <z>=f(<x>,<y>) • <x> and <y> are multi-valued partial state records. For avg, it is a two-tuple <SUM,COUNT> • Initializer i to specify how to instantiate a state record for a single sensor value • Evaluator e takes a partial state record and computes the actual value of the aggregate • AVG: • f (<S1,C1>,<S2,C2>)=<S1+S2,C1+C2> • i (x)=<x,1> • e (<S,C>)=S/C
In-Network Aggregation • Traditional Sensor Network (Fjord Architecture) • Centralized server-based approach: All data are sent back to the server. Sensors do not notice the content of user queries. • Example: • What’s the temperature of my office? • Tuple: <SensorID, Sensor Type, Value, Position, Time Stamp> • Problems: • Not scalable • Energy inefficient • Improvement • Install a filter on each sensor
In-Network Aggregation • <z>=f(<x>,<y>) • Computation Plan: How to divide sensors into partitions • Communication Plan: How to determine next hop. • Key Problem: Match computation plan to communication plan. • Example: What’s the temperature of the fourth floor in the Upson Hall? • Plan: Compute the temperature of each office first, and then compute the final result.
Two algorithms Cluster based algorithm Divide and conquer: Divide the whole query region into smaller clusters, and execute the query in each cluster. Repeat the process until cluster size is small enough. In Network Aggregation
In-Network Aggregation • Cluster based algorithm • Sensors close geographically are usually close in hops • The assumption is not always true • Cluster leader election and maintenance
In Network Aggregation • Tree based Algorithm • Create a Spanning Tree over the query region • Aggregate children data at the parent node
In Network Aggregation • Pipelined Aggregation (TAG) • Two phases: • Flooding phase: the routing tree is built and aggregate queries are pushed down into sensor networks • Aggregate phase: the aggregate values are continually routed up from children to parents • Epoch: the smallest time unit. Must bigger than the transition time of a packet
In Network Aggregation • An Example
In-Network Aggregation • Problems on the pipelined approach • Epoch=? • Delay=Epoch * Depth of the tree • Interval=Epoch • Fault tolerance • Each link and node is a single failure point • If a link close to the root is down, then … • If the query region only occupies a small part of the network, it is wasteful to create and maintain a global spanning tree
In-Network Aggregation • Solution: • Local repair: Find a new route to the tree • Do aggregation when all data from children are received. • Requirements: • Monitor the network continuously • Fast react to network topology changes
In-Network Aggregation • Go deep into the protocol stack • Sensor network is task specific Application Layer Routing Layer Link Layer Mac Layer
Routing and Aggregation • A bunch of existing ad-hoc routing algorithms: AODV, DSDV, DSR, ZRP, Directed Diffusion, etc. • Classified into two main categories: • Table Driven: DSDV, WRP • Source-initiated On-Demand Driven: AODV, DSR, TORA, SSR • Two main tasks: • Route discovery • Route maintenance
Routing and Aggregation • Can we use any existing ad-hoc routing protocol directly? • Centralized algorithm and Cluster algorithm • Tree based algorithm • Different communication pattern • Ad-hoc network: Randomly selected source and destination pair • Sensor network: Query dissemination, data collection • Predictable traffic workload
Routing and Aggregation • New Routing Algorithm • Route Discovery: Similar to Table Driven algorithm, the route information propagates from the destination to the source • Route Maintenance: Similar to Source-initiated On-Demand Driven, support local repair and cooperative repair. Periodically recreates all routes. • New Interface • Send (Packet* p) • Receive (Packet* p) • Filter (Packet* p)
Ongoing Research • Query language and data model • High level query processing algorithm • Low level routing algorithm • Multiple query optimization • Heterogeneous sensor network • Approximate query processing
Summary • Sensor network is a large scale distributed database system. Each sensor is an independent data source • Cluster vs. Tree based algorithm • Performance • Fault tolerance • Applications • How many people are in the system lab? • Interaction between in-network query processing and routing