220 likes | 346 Views
Data Centric Storage: GHT. Brad Karp UCL Computer Science. CS 4C38 / Z25 17 th January, 2006. One View of Sensor Networks: Querying Zebra Sightings. user. User remote; connected via base station How do users pose queries? by event name (e.g., “Zebras?” )
E N D
Data Centric Storage: GHT Brad Karp UCL Computer Science CS 4C38 / Z25 17th January, 2006
One View of Sensor Networks: Querying Zebra Sightings user User remote; connected via base station How do users pose queries? • by event name (e.g., “Zebras?”) Query(“Zebra”) {(“Zebra”, i, [u, v]); (“Zebra”, j, [x, y])} • Geographic Hash Table (GHT) • In-network storage of data • Data placement, query routing built on geographic routing (x, y) j i (u, v)
Problem:Data Dissemination in Sensornets • Sensors numerous and widely dispersed • Sensed data must reach remote user • Data dissemination problem: • How best can we supply measured data to users? • Design drivers for system: • Energy scarce • Wireless media prone to contention
Context:Directed Diffusion [Estrin et al., 2000] “Zebra?” • Data-centric routing: flood queries (interests) by name • Return any responses along reverse paths (“Zebra”, i, [u,v]) (“Zebra”, j, [x,y]) (u, v) i j (x, y)
Assumptions, Metrics, Terminology • Large-scale networks with known geographic boundaries • Users on WAN, a few APs with WAN uplinks • Nodes know own geographic locations; often needed to annotate sensed data • Energy metrics • Total usage: total number packet txs • Hotspot usage: max. number txs by one node • Event: discrete, named object recognized by sensor (e.g., “Zebra”) • Query: request from user for data under same naming scheme
Outline • Motivation and Context • Canonical Data Dissemination Approaches • Geographic Hash Table (GHT) Service • Evaluation in Simulation • Summary
Canonical Approach: Local Storage “Zebra?” For n nodes, Q event names queried for, and Dq events detected with those names, cost (in pkts): • Total: • Hotspot: (at access point) (“Zebra”, i, [u,v]) (“Zebra”, j, [x,y]) (u, v) i j (x, y)
Canonical Approach: External Storage (“Cat”, k, [s,t]) (s, t) (“Zebra”, i, [u,v]) (u, v) (“Zebra”, j, [x,y]) i (x, y) j For n nodes, Dt total events detected, cost (in pkts): • Total: • Hotspot: (at access point)
Canonical Approach: Data-Centric Storage (DCS) user “Zebra?” For n nodes, Q names queried, Dq of those events detected, cost (in pkts): Total (full enumeration): Total (summarization): Hotspot (full enumeration): (at access point) Hotspot (summarization): (at access point) j (x, y) i (u, v)
Cost Comparison ofCanonical Approaches • Local storage incurs greatest total message count asn grows • External storage always sends fewer total messages than DCS • When many more event types detected than queried for, DCS incurs least hotspot message count • DCS permits summarization of events (return multiple events in one packet)
Outline • Motivation and Context • Canonical Data Dissemination Approaches • Geographic Hash Table (GHT) Service • Evaluation in Simulation • Summary
Geographic Hash Table: A Sketch user “Zebra?” • Two operations: • Put(k, v) stores event v under key k • Get(k) retrieves event associated with key k • Hash key k into geo coordinates; store and retrieve events for that key at that location • Spreads key space storage load evenly across network! (a, b) j (x, y) i H(“Zebra”) = (a, b) (u, v)
Design Criteria forScalable, Robust DCS • Storage system must offer persistence despite node and link failures • If node holding k changes, queries and data must make consistent choice of new node • Storage shouldn’t concentrate at any one node • Storage capacity should increase with node count • As ever, avoid traffic concentration, minimize message count
GHT: Home Nodes and Perimeters • Likely no node exactly at H(k); hash function ignorant of topology • Home node: closest node to point output by H(k) • Home perimeter: perimeter enclosing point output by H(k)
Consistency:Perimeter Refresh Protocol (PRP) • (k,v) pairs replicated at all nodes on home perimeter • Non-home nodes on home perimeter: replica nodes • Home node sends refresh packets every Thseconds, containing all (k,v), to H(k) • Receiver of refresh who is closer to H(k) than originator consumes it, initiates its own • Replica node becomes home node if its own refresh returns • Upon forwarding a refresh, node resets takeover timer for Ttseconds; upon expiration, node generates a refresh for k • Death timer: all nodes expire (k,v) pairs they cache after Tdseconds; reset every time refresh for k received.
Outline • Motivation and Context • Canonical Data Dissemination Approaches • Geographic Hash Table (GHT) Service • Evaluation in Simulation • Summary
Further Scaling and Robustness Results • Mean and maximum storage load per node decrease as node population increases • Query success rate above 96% for mobility rates of 0.1 m/s and 1 m/s • Query success rate degrades gracefully as alternation between up/down states accelerates • Validation of relative message costs of three canonical approaches in simulations of up to 100,000 nodes
Follow-On Work in DCS • Mapping geographic boundaries of a network; support hashing to inside a network with changing boundaries • DCS without geographic routing: GEM [NeSo03] • Range queries for GHT using K-D trees: DIM [LiGo03] • Assigning coordinates for geographic routing using only topological knowledge (not, e.g., GPS) [RaRa03] • Dealing with non-uniform node distributions; multiple hash functions [GaEs03]
DCS: Summary • Three canonical approaches will be useful in data dissemination for sensor networks: local storage, external storage, and data-centric storage • Summarization is a key advantage of the DCS approach in reducing hotspot usage and total usage; home node is a useful aggregation point • Sensor applications with many nodes, many event types, not all queried are those where DCS offers most attractive performance vs. other canonical approaches • GHT spreads storage load evenly on sensor networks • GHT offers robust persistence under node failures and mobility, because it binds data to fixed locations, rather than to “volatile” nodes