140 likes | 264 Views
Optimizing Query Processing In Sensor Networks. Ross Rosemark. Our Research. We argue that the sensor network can be viewed as a database where each node is a table.
E N D
Optimizing Query Processing In Sensor Networks Ross Rosemark
Our Research • We argue that the sensor network can be viewed as a database where each node is a table. • Under this view.. we argue that a query plan can now dictate how to abstract data from the sensor network rather than a sensor node.
Our Goal • Given a query Q2 • Define how to process the query • What metadata (if any) should be collected • What query plan should a node utilize to abstract data from it’s local sensors • What is the routing infrastructure of the query • Example • Given Q2 • Define cost of • Collecting metadata + Execution Cost + Routing cost • Define cost of • Not collecting metadata + execution cost + routing cost • Choose the lowest cost
Idea • Evaluate multiple different infrastructures for a query • Choose the infrastructure that utilizes the least energy • The * operator means aggregation • Not database aggregation (i.e. Sum, Count) but rather aggregation that is discussed in networks
Research Issue • We use metadata to evaluate different query plans • Metadata becomes an important research issue • Which nodes should send metadata to the AP • What metadata does the AP require • We do an on demand approach in terms of collecting metadata
Metadata Collection • Algorithm to collect metadata • Only nodes participating in query send metadata • Algorithm • Access Point routes query to spatial area utilizing GPSR • Utilizing GPSR query is routed around spatial area • Each node on perimeter of spatial area floods msg inside spatial area • Each node in spatial area sends metadata to the AP utilizing GPSR
Metadata For a given query Q1 • Initially the access point knows: • The number of nodes in the network (N) • The spatial area of the network (SA) • The query area (QA)…. (we only consider spatial queries) • A histogram that represents the selectivity of each attribute • Bad representation • Using this information • Query Plan 1 • Estimate metadata collection cost • Estimate query execution cost if metadata is collected • Estimate result collection cost • Query Plan 2 • Estimate query execution cost if metadata is not collected • Estimate result collection cost if metadata is not collected • If (Query Plan 1 > Query Plan 2) • Choose Query Plan 1 • Else • Choose Query Plan 2
Metadata Collection • When metadata is collected • nodes participating in a query send • the selectivity of each of the queries predicates • It’s longitude and latitude • Example • Query 1-> Select * From Sensors Where Light > 10 • Node participating in query send the selectivity of the predicate Light > 10 • Node participating in query send Longitude/Latitude (i.e. Longitude = 1002.3 Latitude = 2003.1
Metadata • If query 2 now comes in and covers the same/subset of the spatial area of query 1 then we evaluate the following: • Should we collect more metadata, or just optimize with our current metadata • Estimate the metadata collection cost • Estimate query execution cost • Estimate routing cost • This is a repeat of the initial problem • Our estimates are now better though
Results • Estimated mathematically the energy associated with • Metadata collection • Query Execution • Ran simulations to get real values for these metrics • In simulations inserted 1 spatial query into the network • Ran this experiment varying • The query (spatial area) (predicates) • Topology (5 different topologies) • Metadata (5 different metadata distributions)