Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks

Supporting Aggregate Queries OverAd-Hoc Wireless Sensor Networks

Abstract : • This paper proposes a generic, query based scheme for extracting data from sensor networks.

Main Aim of the Paper • The main idea behind this paper is to show how a generic query interface for data aggregation can be applied to ad-hoc networks of sensor devices. • The authors want to emphasize the fact that this technique helps in querying arbitrary data in a sensor network without building any custom application. • To study the generic aggregation techniques.

Introduction • Advances in computing technology have led to the production of wireless battery powered smart sensors. • Due to deployment of large sensor networks a need arises for tools to collect & query data from these network.

Aggregation • It is an important issue from the network performance & longevity standpoint • It drastically reduces the amount of data routed through the network . • Increases throughput , extending life of battery powered sensor networks. • It provides benefits like optimizing the computation & giving programmers the ease to issue declaration SQL style queries.

Background The authors discuss the relevant design aspects of the following • Motes • TinyOS • Ad-hoc Sensor Networks • Aggregation in Database Systems The paper then summarizes aggregation in database systems & discusses how these techniques provide a useful & well defined framework for computing aggregates in sensor networks.

Motes • Configuration : Equipped with 4MHz Atmel microprocessor, RAM : 512 bytes , Code space : 8kB ,917 MHz RFM radio running at 10 kb/s, EEPROM : 32 kB. • Sensor options : Light , Temperature, Magnetic Field, Acceleration, Sound and Power. • Tradeoff : Power consumption of each sensor node is dominated by the cost of transmitting and receiving messages. • Message delivery is unreliable by default.

A TinyOS Sensor Mote

TinyOS • TinyOS makes it possible to deploy ad-hoc networks of sensors that can locate each other & route data without any prior knowledge of network topology. • They help in writing programs that capture & process sensor data & transmit messages over the radio.

Ad-hoc Sensor Networks • Each sensor has a unique id .Sensors route data by adopting a technique of building a routing tree. • One sensor is appointed as the root ,that interfaces the querying user to the rest of the network. • The constant topology maintenance makes it easy to adapt to network changes caused by mobility of certain nodes or to the addition or deletion of sensors.

The root broadcasts the message asking sensors to organize into a routing tree. • The message contains the root id ,its level and distance from the root ,i.e. zero. • Sender of message is chosen by the sensors as its parent through which it will route messages to the root. • The application helps to efficiently route data towards the root. • This application doesn’t address point-to-point routing.

Aggregation in Database Systems • Its defined by an aggregate function & a grouping predicate in SQL – based database systems. • Aggregate Function :- It specifies how a set of values should be combined to compute an aggregate .Eg: COUNT ,MIN,MAX , AVERAGE , and SUM • SELECT AVERAGE (temp) FROM sensors

Most database systems allow user – defined functions (UDFs) , that specify more complex aggregates . • Grouping Predicate :- It partitions values into groups based on certain attributes. • Eg : SELECT TRUNC (temp/10) , AVERAGE (light) FROM sensors GROUP BY TRUNC (temp/10) HAVING AVERAGE (light) > 50 • The above query partitions the sensor readings into groups by their temperatures reading and computes the average light within a group.

Generic Aggregation Techniques • An implementation of sensor network aggregation would be to use a centralized ,server based approach. • However focus is given to the distributed in-network approach since it has the potential to be both lower latency and lower power as compared to the server based approach. • Its assumed that the entire experiment is based on the fact that the user is stationed on a desktop PC that has a large memory.

Advantages of in-network approach • Consider computing an aggregate over a group of sensors as arranged in the following figure. • Dotted lines represent connections between sensors • Solid lines represent the routing tree imposed on top of this graph to allow sensors to propagate data to the root along a single path.

Sensors in fig (a),are labeled with their distance from the root. • Summing these numbers gives 16 messages required to route all aggregation information to the root. • Each node is labeled with the number of messages required to get data to the host PC.i.e. 16 messages are required. • Sensors in fig (b) :- sensors with no children simply transmit their readings tot heir parents. • One message is sent along each edge as aggregation is performed by sensor themselves . • Intermediate nodes combine their own readings with the readings of their children via the aggregation function f and propagate the partial aggregate along with any additional data required to update the aggregate ,up the tree.

The amount of data transmitted in this solution depends on the aggregate • The focus is on class of aggregation predicates that is particularly well suites to the in-network regime.Such aggregated are denoted by an aggregate function f over the sets a and b. f ( a U b) = g ( f(a) , f(b) ) • We assume that aggregate queries do not specify groups .

Injecting a Query • Computing an aggregate consists of two phases. • Propagation phase : in which aggregate queries are pushed down into sensor networks. • Aggregation phase : in which aggregate values are propagated up from the children to the parents . • In the network discovery algorithm, leaf nodes must discover that they are leaves and propagate singular aggregates upto their parents.

Thus when a sensor p receives an aggregate a ,either from another sensor or user, it transmits a & begins listening. • If p has any children they will hear those children retransmit a to their children & will know it is not a leaf. • At some time t ,p has heard no children and concludes that it’s a leaf and transmits its current sensor value up the routing tree. • If p has children they will report within time t & thus computes the value of a applied its own value

Choosing a short duration for t leads to missed reports from children. • Then while injecting a query using propagation aggregate, the time interval is set to be long enough so that the messages have time to propagate down to the leaves and back in the routing tree . • T = 2 * ( dp-dtree) * (txmit + tprocesss) where txmit is – time to send a msg tprocess is time to process aggregation request. • This approach is undesirable as it takes long computation times. The major limitation of the tree based routing approach is that it is not suitable for peer-to-peer routing.

Streaming Aggregates • Sensor networks are inherently unreliable. • Individual radio transmission can fail • Nodes can move All this makes it difficult to guarantee that certain portion of the network was not detached during a particular aggregate computation. • Eg : If p broadcasts a ,& its only child , c ,due to some reason misses the message p wont ever hear c rebroadcast & thus entire network below p is excluded from aggregation computation and the end result is probably incorrect.

This problem can be solved by computing double check aggregates multiple times i.e. to request the aggregate be computed many times at the root of the network . • The drawback of this technique is that it requires retransmitting the aggregate request down the network multiple time ,at a significant message overhead. • The pipelined aggregate scheme was proposed which has time divided into intervals of duration i .

Properties of Pipelined Aggregate • After aggregates have propagated up from leaves ,a new aggregate arrives every i seconds. • t is total time for an aggregation request to propagate down to the leaves& back to the root, but user begins to see approximations of the aggregate after the 1st interval has elapsed. • These properties change as the sensor readings and underlying network change. • Drawback of this approach s that a number of additional messages are transmitted to extract the first aggregate over all sensors . • This scheme will improve robustness of aggregates ,throughput.

Pipelined Computation of Aggregates

Shared Channel • The previous algorithms have ignored the fact that sensors communicate over shared radio channel • Shared channel increases message efficiency . • It improves the number of sensors participating in any aggregate. • It reduces the number of messages sent, by snooping. • The inherently broadcast nature of radio offers communications redundancy which improve reliability

Multiple parents issue Consider the case where sensor s sends count c to a single parent, expected value of transmitted count is p * c & variance is c2 * p * (1-p)

Hypothesis Testing • The main drawback of the shared channel approach is that it requires input from every node in the network to compute an aggregate. • Hypothesis Testing :- • When computing a MAX or MIN , a sensor can snoop on the values its peers report & omit its own value if its aware that it cannot affect the final value of the aggregate.

In this approach, leaf nodes will be required to send no message if their value is greater than the minimum observed over the top k levels. • If we assume sensor values to be independent and randomly distributed ,then a particular node must transmit with probability 1/2k which is low for even small values of k.

Summary of Techniques • Pipelining aggregates enables to increase the throughput and to smooth over intermittent losses inherent in radio communication. • Snooping over radio to reduce message load , improve accuracy of aggregates . • Hypothesis testing to invert problems & further reduce the number of messages sent.

Grouping • Grouping computes aggregates over partitions of sensor readings and its basic technique is to push down a set of predicates which specify group membership, ask sensors to choose the group they belong to,& then as answers flow back , update the aggregate values in the appropriate groups. • Each group predicate specifies a group id, a sensor attribute and a range of sensor values . • Groups are assumed to be disjoint & defined over the same attribute which may not be the attribute being aggregated.

When a sensor is a leaf ,and receives a message from a child ,& checks the group number. • If the child is in the same group as the sensor , it combines the two values (of sensor). Else it stores the value of the child’s group along with its own value for forwarding in the next arrival. • The predicate is only sent into the network if it can potentially be used to reduce the number of messages that must be sent. Eg : predicate : MAX (attr) > x then information about groups with MAX (attr)<= x need not be transmitted up the tree.

Since the number of groups can exceed available storage on any one sensor , a way to evict groups is needed. • Evicting partially computed groups is known as partial pre-aggregation. Evicting groups with low membership is likely a good policy, as they are least likely to be combined with other sensor readings. • Evicting groups forces information about current time interval into higher level nodes in the tree.

In the above diagram of the standard pipelined scheme , aggregates are computed over values from the previous time interval, this presents an inconsistency

Related Work • Cougar project at Cornell discusses queries over sensor networks – it only considers moving selection operators onto sensors. • USC/ISI & UCLA have contributed to the works on networks within the sensor network community. • An important platform on which our solution operates is the number of papers published by the TinyOS group at UC Berkeley describing the design of motes, TinyOS and the implementation of the networking protocols used to construct ad-hoc sensor networks.

Future Work • Researchers at UC Berkeley are currently working with the sensor testbed built by the TinyOS group to empirically verify algorithms presented in this paper. • There is a need for experimental & mathematical validation of many techniques presented in this paper.

Challenges • The authors have not explored the tradeoffs between fully pipelined communication and techniques such as sending values only when sensor readings change. • Its not clear how this approach will behave when sensors move. The routing tree construction algorithm allows moving nodes to reattach, but its unclear how movements and disconnections affect the value of aggregates. • The problem of computing multiple simultaneous aggregates over a single sensor network has yet to be explored

Conclusion • Thus by applying generic aggregation operations this approach offers the ability to query arbitrary data in a sensor network. • Its possible to robustly compute aggregates while providing rapid and continuous updates of their value to the user by pipelining the flow of data through the sensor network. • By snooping messages n the shared channel & applying techniques for hypothesis testing , its possible to improve the performance .

Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks