670 likes | 780 Views
Reliable Transport and Code Distribution in Wireless Sensor Networks. Thanos Stathopoulos CS 213 Winter 04. Reliability: Introduction. Not an issue on wired networks TCP does a good job Link error rates are usually ~ 10 -15 No energy cost However, WSNs have: Low power radios
E N D
Reliable Transport and Code Distribution in Wireless Sensor Networks Thanos Stathopoulos CS 213 Winter 04
Reliability: Introduction • Not an issue on wired networks • TCP does a good job • Link error rates are usually ~ 10-15 • No energy cost • However, WSNs have: • Low power radios • Error rates of up 30% or more • Limited range • Energy constraints • Retransmissions reduce lifetime of network • Limited storage • Buffer size cannot be too large • Highly application-specific requirements • No ‘single’ TCP-like solution
Approaches • Loss-tolerant algorithms • Leverage spatial and temporal redundancy • Good enough for some applications • But what about code updates? • Add retransmission mechanism • At the link layer (e.g. SMAC) • At the routing/transport layer • At the application layer • Hop-by-hop or end-to-end?
Relevant papers • PSFQ: A Reliable Transport Protocol for Wireless Sensor Networks • RMST: Reliable Data Transport in Sensor Networks • ESRT: Event-to-Sink Reliable Transport in Wireless Sensor Networks
PSFQ: Overview • Key ideas • Slow data distribution (pump slowly) • Quick error recovery (fetch quickly) • NACK-based • Data caching guarantees ordered delivery • Assumption: no congestion, losses due only to poor link quality • Goals • Ensure data delivery with minimum support from transport infrastructure • Minimize signaling overhead for detection/recovery operations • Operate correctly in poor link quality environments • Provide loose delay bounds for data delivery to all intended receivers • Operations • Pump • Fetch • Report
End-to-end considered harmful ? • Probability of reception degrades exponentially over multiple hops • Not an issue in the Internet • Serious problem if error rates are considerable • ACKs/NACKs are also affected
Proposed solution: Hop-by-Hop error recovery • Intermediate nodes now responsible for error detection and recovery • NACK-based loss detection probability is now constant • Not affected by network size (scalability) • Exponential decrease in end-to-end • Cost: Keeping state on each node • Potentially not as bad as it sounds! • Cluster/group based communication • Intermediates are usually receivers as well
Pump operation • Node broadcasts a packet to its neighbors every Tmin • Data cache used for duplicate suppression • Receiver checks for gaps in sequence numbers • If all is fine, it decrements TTL and schedules a transmission • Tmin < Ttransmit < Tmax • By delaying transmission, quick fetch operations are possible • Reduce redundant transmissions (don’t transmit if 4 or more nodes have forwarded the packet already) • Tmax can provide a loose delay bound for the last hop • D(n)=Tmax * (# of fragments) * (# of hops)
Fetch operation • Sequence number gap is detected • Node will send a NACK message upstream • ‘Window’ specifies range of sequence numbers missing • NACK receivers will randomize their transmissions to reduce redundancy • It will NOT forward any packets downstream • NACK scope is 1 hop • NACKs are generated every Tr if there are still gaps • Tr < Tmax • This is the pump/fetch ratio • NACKs can be cancelled if neighbors have sent similar NACKs
Proactive Fetch • Last segments of a file can get lost • Loss detection impossible; no ‘next’ segment exists! • Solution: timeouts (again) • Node enters ‘proactive fetch’ mode if last segment hasn’t been received and no packet has been delivered after Tpro • Timing must be right • Too early: wasted control messages • Too late: increased delivery latency for the entire file • Tpro = a * (Smax - Smin) * Tmax • A node will wait long enough until all upstream nodes have received all segments • If data cache isn’t infinite • Tpro = a * k * Tmax (Tpro is proportional to cache size)
Report Operation • Used as a feedback/monitoring mechanism • Only the last hop will respond immediately (create a new packet) • Other nodes will piggyback their state info when they receive the report reply • If there is no space left in the message, a new one will be created
Experimental results • Tmax= 0.3s, Tr = 0.1s • 100 30-byte packets sent • Exponential increase in delay happens at 11% loss rate or higher
PSFQ: Conclusion • Slow data dissemination, fast data recovery • All transmissions are broadcast • NACK-based, hop-by-hop recovery • End-to-end behaves poorly in lossy environments • NACKs are superior to ACKs in terms of energy savings • No out-of-order delivery allowed • Uses data caching extensively • Several timers and duplicate suppression mechanisms • Implementing any of those on motes is challenging (non-preemptive FIFO scheduler)
RMST: Overview • A transport layer protocol • Uses diffusion for routing • Selective NACK-based • Provides • Guaranteed delivery of all fragments • In-order delivery not guaranteed • Fragmentation/reassembly
Placement of reliability for data transport • RMST considers 3 layers • MAC • Transport • Application • Focus is on MAC and Transport
MAC Layer Choices • No ARQ • All transmissions are broadcast • No RTS/CTS or ACK • Reliability deferred to upper layers • Benefits: no control overhead, no erroneous path selection • ARQ always • All transmissions are unicast • RTS/CTS and ACKs used • One-to-many communication done via multiple unicasts • Benefits: packets traveling on established paths have high probability of delivery • Selective ARQ • Use broadcast for one-to-many and unicast for one-to-one • Data and control packets traveling on established paths are unicast • Route discovery uses broadcast
Transport Layer Choices • End-to-End Selective Request NACK • Loss detection happens only at sinks (endpoints) • Repair requests travel on reverse (multihop) path from sinks to sources • Hop-by-Hop Selective Request NACK • Each node along the path caches data • Loss detection happens at each node along the path • Repair requests sent to immediate neighbors • If data isn’t found in the caches, NACKs are forwarded to next hop towards source
Application Layer Choices • End-to-End Positive ACK • Sink requests a large data entity • Source fragments data • Sink keeps sending interests until all fragments have been received • Used only as a baseline
RMST details • Implemented as a Diffusion Filter • Takes advantage of Diffusion mechanisms for • Routing • Path recovery and repair • Adds • Fragmentation/reassembly management • Guaranteed delivery • Receivers responsible for fragment retransmission • Receivers aren’t necessarily end points • Caching or non-caching mode determines classification of node
RMST Details (cont’d) • NACKs triggered by • Sequence number gaps • Watchdog timer inspects fragment map periodically for holes that have aged for too long • Transmission timeouts • ‘Last fragment’ problem • NACKs propagate from sinks to sources • Unicast transmission • NACK is forwarded only if segment not found in local cache • Back-channel required to deliver NACKs to upstream neighbors
Evaluation • NS-2 simulation • 802.11 MAC • 21 nodes • single sink, single source • 6 hops • MAC ARQ set to 4 retries • Image size: 5k • 50 100-byte fragments • Total cost of sending the entire file: 87,818 bytes • Includes diffusion control message overhead • All results normalized to this value
Results: Baseline (no RMST) • ARQ and S-ARQ have high overhead when error rates are low • S-ARQ is better in terms of efficiency • Also helps with route selection • No ARQ results drop considerably as error rates increase • Exponential decay of end-to-end reliability mechanisms
Results: RMST with H-b-H Recovery and Caching • Slight improvement for ARQ and S-ARQ results over baseline • No ARQ is better even in the 10% error rate case • But, many more exploratory packets were sent before the route was established
Results: RMST with E-2-E Recovery • No ARQ doesn’t work for the 10% error rate case • Numerous holes that required NACKs couldn’t make it from source to sink without link-layer retransmissions • ARQ and S-ARQ results are statistically insignificant from H-b-H results • NACKs were very rare when any form of ARQ was used
Results: Performance under High Error Rates • No ARQ doesn’t work for the 30% error rate case • Diffusion control messages could not establish routes most of the time • In the 20% case, it took several minutes to establish routes
RMST: Conclusion • ARQ helps with unicast control and data packets • In high error-rate environments, routes cannot be established without ARQ • Route discovery packets shouldn’t use ARQ • Erroneous path selection can occur • RMST combines a NACK-based transport layer protocol with S-ARQ to achieve the best results
Congestion Control • Sensor networks are usually idle… • …Until an event occurs • High probability of channel overload • Information must reach users • Solution: congestion control
ESRT: Overview • Places interest on events, not individual pieces of data • Application-driven • Application defines what its desired event reporting rate should be • Includes a congestion-control element • Runs mainly on the sink • Main goal: Adjust reporting rate of sources to achieve optimal reliability requirements
Problem Definition • Assumption: • Detection of an event is related to number of packets received during a specific interval • Observed event reliability ri: • # of packets received in decision interval I • Desired event reliability R: • # of packets required for reliable event detection • Application-specific • Goal: configure the reporting rate of nodes • Achieve required event detection • Minimize energy consumption
Reliability vs Reporting frequency • Initially, reliability increases linearly with reporting frequency • There is an optimal reporting frequency (fmax), after which congestion occurs • Fmax decreases when the # of nodes increases
Characteristic Regions • n: normalized reliability indicator • (NC,LR): No congestion, Low reliability • f < fmax, n < 1-e • (NC, HR): No congestion, High reliability • f <= fmax, n < 1+e • (C, HR): Congestion, High reliability • f > fmax, n > 1 • (C, LR): Congestion, Low reliability • f < fmax, n <= 1 • OOR: Optimal Operating Region • f < fmax, 1-e <= n <= 1+e
ESRT Requirements • Sink is powerful enough to reach all source nodes (i.e. single-hop) • Nodes must listen to the sink broadcast at the end of each decision interval and update their reporting rates • A congestion-detection mechanism is required
Congestion Detection and Reliability Level • Both done at the sink • Congestion: • Nodes monitor their buffer queues and inform the sink if overflow occurs • Reliability Level • Calculated by the sink at the end of each interval based on packets received
ESRT Protocol Operation • (NC, LR): • (NC, HR): • (C, HR): • (C, LR):
ESRT: Conclusion • Reliability notion is application-based • No delivery guarantees for individual packets • Reliability and congestion control achieved by changing the reporting rate of nodes • Pushes all complexity to the sink • Single-hop operation only
Code Distribution: Introduction • Nature of sensor networks • Expected to operate for long periods of time • Human intervention impractical or detrimental to sensing process • Nevertheless, code needs to be updated • Add new functionality • Incomplete knowledge of environment • Predicting right set of actions is not always feasible • Fix bugs • Maintenance
Approaches • Transfer the entire binary to the motes • Advantage • Maximum flexibility • Disadvantage • High energy cost due to large volume of data • Use a VM and transfer capsules • Advantage • Low energy cost • Disadvantages • Not as flexible as full binary update • VM required • Reliability is required regardless of approach
Papers • A Remote Code Update Mechanism for Wireless Sensor Networks • Trickle: A Self-Regulating Algorithm for Code Propagation and Maintenance in Wireless Sensor Networks
MOAP: Overview • Code distribution mechanism specifically targeted for Mica2 motes • Full binary updates • Multi-hop operation achieved through recursive single-hop broadcasts • Energy and memory efficient
Requirements and Properties of Code Distribution • The complete image must reach all nodes • Reliability mechanism required • If the image doesn’t fit in a single packet, it must be placed in stable storage until transfer is complete • Network lifetime shouldn’t be significantly reduced by the update operation • Memory and storage requirements should be moderate
Resource Prioritization • Energy: Most important resource • Radio operations are expensive • TX: 12 mA • RX: 4 mA • Stable storage (EEPROM) • Everything must be stored and Write()s are expensive • Memory usage • Static RAM • Only 4K available on current generation of motes • Code update mechanism should leave ample space for the real application • Program memory • MOAP must transfer itself • Large image size means more packets transmitted! • Latency • Updates don’t respond to real-time phenomena • Update rate is infrequent • Can be traded off for reduced energy usage
Design Choices • Dissemination protocol: How is data propagated? • All at once (flooding) • Fast • Low energy efficiency • Neighborhood-by-neighborhood (ripple) • Energy efficient • Slow • Reliability mechanism • Repair scope: local vs global • ACKs vs NACKs • Segment management • Indexing segments and gap detection: Memory hierarchy vs sliding window
Ripple Dissemination • Transfer data neighborhood-by-neighborhood • Single-hop • Recursively extended to multi-hop • Very few sources at each neighborhood • Preferably, only one • Receivers attempt to become sources when they have the entire image • Publish-subscribe interface prevents nodes from becoming sources if another source is present • Leverage the broadcast medium • If data transmission is in progress, a source will always be one hop away! • Allows local repairs • Increased latency
Reliability Mechanism • Loss responsibility lies on receiver • Only one node to keep track of (sender) • NACK-based • In line with IP multicast and WSN reliability schemes • Local scope • No need to route NACKs • Energy and complexity savings • All nodes will eventually have the same image
Retransmission Policies • Broadcast RREQ, no suppression • Simple • High probability of successful reception • Highly inefficient • Zero latency • Broadcast RREQ, suppression based on randomized timers • Quite efficient • Complex • Latency and successful reception based on randomization interval
Retransmission Policies (cont’d) • Broadcast RREQ, fixed reply probability • Simple • Good probability of successful reception • Latency depends on probability of reply • Average efficiency • Broadcast RREQ, adaptive reply probability • More complex than the static case • Similar latency/reception behavior • Unicast RREQ, single reply • Smallest probability of successful reception • Highest efficiency • Simple • Complexity increases if source fails • Zero latency • High latency if source fails
Segment Management: Discovering if a segment is present • No indexing • Nothing kept in RAM • Need to read from EEPROM to find if segment i is missing • Full indexing • Entire segment (bit)map is kept in RAM • Look at entry i (in RAM) to find if segment is missing • Partial indexing • Map kept in RAM • Each entry represents k consecutive segments • Combination of RAM and EEPROM lookup needed to find if segment i is missing
Segment Management (cont’d) • Hierarchical full indexing • First-level map kept in ram • Each entry points to a second-level map stored in EEPROM • Combination of RAM and EEPROM lookup needed to find if segment i is missing • Sliding window • Bitmap of up to w segments kept in RAM • Starting point: last segment received in order • RAM lookup • Limited out-of-order tolerance!