1 / 36

Data-centric view of sensornets: An Overview

Data-centric view of sensornets: An Overview. Puru Kulkarni Vijay Sundaram Bhuvan Urgaonkar. Motivation. Ubiquitous presence of sensor networks Communication, computation, limited storage, sensing capabilities Used to sense, actuate, control Sensors everywhere = Data everywhere!

cabner
Download Presentation

Data-centric view of sensornets: An Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-centric view of sensornets: An Overview Puru Kulkarni Vijay Sundaram Bhuvan Urgaonkar

  2. Motivation • Ubiquitous presence of sensor networks • Communication, computation, limited storage, sensing capabilities • Used to sense, actuate, control • Sensors everywhere = Data everywhere! • Require an infrastructure for data access and storage

  3. Overview • Sensors sense/generate data • Users/Applications interested in data or some measure of data • Common user operations are: • Queries and Monitoring • Actuate and Control

  4. Typical Queries • Historical • What is the average rainfall over past 2 days? • Current • What is the current temperate in Rm# 226? • Long Running • Temperature in Rm# 226 over the next 4 hours every 30 seconds

  5. Issues • How to identify relevant sensors? • Computation vs. Communication tradeoff • Where to process query? • inside the sensor network (route query) • Need new techniques • at a centralized location (route data) • Large amounts of data transfer (not efficient) • Data gathering may not reflect query rate • How to process query? • queries on streaming data

  6. DataSpace: Querying and Monitoring Deeply Networked Collections in Physical SpaceT. Imielinski and S. Goel, Rutgers University • Billions of objects populate space • Each produces and locally stores data • Location aware • Can be selectively monitored, queried and controlled • Physical world enhanced with data

  7. Characteristics • Dataspace • Data lives on the object • Users access not only “local” information but can navigate entire dataspace • Spatial world divided in 3-D datacubes • CS Bldg. , street, block etc • Communication, messaging and computation techniques for querying and monitoring required

  8. Querying and Monitoring • Queries are spatially driven • Steps: • Identify relevant datacubes • Identify relevant nodes (dataflocks) • Datacube directory service • Aggregation for queries on several datacubes • e.g.: Information about Manhattan taxi cabs

  9. Architecting DataSpace • Network as DataSpace engine • multicast mechanisms (each node has an IP address!) • group membership based on • physical location • attribute (temperature, #vehicles etc) • multicast fits selective node addressing criteria to access relevant data • e.g.: what is average temperature in CS Bldg? • Query reaches only sensors in the CS Bldg datacube and have the corresponding group address

  10. Based on interested attribute Based on location of datacube <space-handle> <subject-handle> DataSpace address Network as DataSpace engine • Space Handleencodes datacube information • Subject Handle attributes that are part of a multicast group • Dataspace address is a IPv6 mutlicast address E.g.: Space handle: 224.4.5 Subject handle: 8 Dataspace address: 224.4.5.8

  11. Geographic Routing infrastruture • Route message based on physical location rather than IP address • Use GPS coordinates for locations • Avoids use of multicast for routing queries to datacubes • Once query reaches a region use mutlicast

  12. Geographic Routing infrastruture • Geo-router (routes based on datacube location) • Geo-node (issue query to nodes in datacube) • Geo-host (process geographics messages) • Approach • Route query to datacube • Geo-nodes route query within datacube • mulitcast with a TTL of 1

  13. The Sensor Network as a Database • Govindan, Hellerstein, Hong, Madden, Franklin, Shenker • Querying the Physical World • Bonnet, Gehrke, Seshadri

  14. Sensornet Database architecture • Given a routing and access mechanism, how to process queries? • Provide a DB-view to users/apps • well understood programming interface • common data operations use computation in network • help energy-efficiency • allow users to be unaware of actual network, but treat it as a database • Sensor Network + Data => Sensor Network Database

  15. What is required? • Core DB operations tailored for sensor networks • Design appropriate building blocks for DB operations • Join, aggregation, grouping, selection etc

  16. SensornetDatabase Architecutre • Two important ideas: • in-network implementationsof primitive database query operators such as grouping, aggregation, and joins • group communication and routing protocols with possible processing at intermediate nodes implement the operator in an application independent way

  17. SensornetDatabase Architecutre • Relax the semantics of database queries to allow approximate results • relaxation enables energy-efficient implementations even given the expected high level of network dynamics • A sensor network is a proxy for a continuous realworld phenomenon, and by nature samples that phenomenon discretely at some rate, with some degree of error.

  18. In-network Implementation • JOIN operator • selection over cross-product of a pair of tables • Tuples generated at different nodes might be joined at a single node • Some JOIN implementations are blocking • Blocking is infeasible in sensor networks • tables can contain unbounded streams of data • amount of memory available is limited • Need to retool these operations • Pipelining • Partitioning

  19. Non Blocking Pipelinined Joins • Symmetric hash-join: • Maintains two hash tables (keyed by the column(s) used for the join) • On an input tuple, looks up matching tuples from other input’s hash table • Outputs any matching results • Ripple joins: • Statistically sample the two tables to be joined, in order to produce a stream of joined tuples • Relative rates at which the two tables are sampled adapt to match the variance produced by the data in each • low energy approach to obtain approximateanswers

  20. Partitioning • Partitioning: • tuples are partitioned based on their join-column values and redistributed on the fly across multiple nodes; • the work of joining the individual partitions is done in parallel by each of the nodes • Partitions can be defined by value, geographically, or by sensor type, and a node (or nodes) can be designated to perform the join for the partition

  21. In-network Implementation • Aggregation operators • summarization of a column(s) into a single numerical value E.g. SUM, COUNT, AVERAGE, MIN, MAX etc • query flooded in the network and the responses are routed on the reverse path trees, • results aggregated across several nodes • E.g: to calculate AVERAGE each node returns (SUM, COUNT) values to parent • Can be a very common operator

  22. Distributed Sensnet DBs • How to represent devices in DBs on sensornets? • ADTs (Abstract Data Types) • Methods correspond to sensing functionality • Virtual Relations (VRs) store local data • Network used for query operations

  23. Virtual Relation • VR with attributes as • Inputs to an ADT (device) function • Arguments to an ADT function • Output of the function • Timestamp of the function

  24. Virtual Relation • Some VR properties • records are never updated or deleted • is naturally partitioned over the sensnet (each device takes care of its set of VR records) • What does this mean? – a distributed DB • Records from the VRs (distributed over the devices) are processed using distributed query execution plans

  25. Approximate Results • Energy-efficiency can be achieved using approximate aggregates • Uniform sampling: • Tuples are uniformly sampled and the resulting average is assumed to represent the actual average • Packet loss might invalidate the statistical assumptions that these intervals depend on. • Logarithmic sampling • The number of respondents (or the size of memory needed for the count) scales logarithmically with the size of the network • Provides looser error bounds but uses significantly less memory or communication.

  26. Complex query evaluation • R x S x T • What order to follow? • (RxS)xT or Rx(SxT) or (RxT)XS • Decided by query optimizer • Usually depends on table size • With Sensernret DB • Need adaptive policy to route tuples based on • Energy consumption • Topology • Loss rates

  27. Conclusions • Explosion of data from sensor networks needs an infrastructure for access, storage etc • Organizing sensors • Datacubes • Other techniques ? • Identifying relevant sensors is preliminary to fetch data • Dataspace provided two solutions • Other approaches ?

  28. Conclusions • Sensornets as Distributed DB • Provide a database view to sensornet data • Pros • App development easy • In-network processing helps resource usage • Cons • Distributed DB can be difficult • Requires to retool DB operations for sensornets • Other approaches?

  29. Representations for Devices Functions • Internal Representation • We can’t use trad OO DB methods • - they all demand immediate access • - with asynchronous quality of sensnets this is unacceptable

  30. Overview • Direction of sensor networks progress • Small form-factor devices • On-board computation • Wireless communication • Increased sensing capabilities • Improved OS and networking functionalities • Prediction: • Every device (> 1 $) will have some sensor • Ubiquitous presence of sensor networks

  31. Overview • Typical sensor networks usage: • Sense, collect and convey data • Provides a ubiquitous computing platform • Applications query/monitor sensed data • Ecosystem dynamics • Temperature/weather sensing • Automobile traffic analysis • Data-centric network, generated data more important than node identity

  32. Requirements • Addressing • Identify relevant sensors • How to access/process data? • Communicate data and process centrally • Compute query at node and perform DB operations • Interface for querying/monitoring and control

  33. What to do with data? • Answer queries/give useful info • How ?? • Centralized approach • Communicate data • Store and process all data at central location (traditional DB approach) • Is all temporal data to be stored? • Communication overhead?

  34. What to do with data? • De-centralized approach • Communicate query (query routing) • Required data attribute of node • Node stores and communicates data to queries • Processing at node • Computation overhead • Computation overhead smaller than communication! • How to aggregate data? • How to route queries? • How to map nodes to addresses for communication purposes?

  35. Need for Decentralization • Centralized (Traditional databases) • Inefficient use of resources • Large amounts of data communicated to central location • All sensors send data all the time • Dissociates access to device from query load • Communication more expensive than computation • Decentralized (Distributed DBs) • Data on devices • In-network query processing

  36. Pipelining Benefits • Provide streamed partial answers, hence, can enable query refinement • Schemes like ripple joins form a low energy approach to obtain approximate answers and can be used together with sampling

More Related