340 likes | 547 Views
Fountain Codes Based Distributed Storage Algorithms for Large-scale Wireless Sensor Networks. Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008. Outlines. Introduction Fountain Codes LT-Codes Based Distributed Storage (LTCDS) Algorithms With limited Global Information - LTCDS-I
E N D
Fountain Codes BasedDistributed Storage Algorithms for Large-scaleWireless Sensor Networks Salah A. Aly, Zhenning Kong, Emina Soljanin IEEE IPSN 2008
Outlines • Introduction • Fountain Codes • LT-Codes Based Distributed Storage (LTCDS) Algorithms • With limited Global Information - LTCDS-I • Without any Global Information - LTCDS-II • Performance Evaluation • Conclusion
Introduction • Nodes in wireless sensor networks have limited resources e.g. CPU power, bandwidth, memory, lifetime. • They can monitor objects, detect fires, floods, and other disaster phenomenon. • We consider a network with n randomly distributed nodes; among them are k sensor nodes, k/n = 10%. • Our goal is to find techniques to redundantly distribute data from k source nodes to n storage nodes. • So, by querying any (1+ε)k nodes, one can retrieve the original information acquired by the k sources with low computational complexity.
Introduction • There are 25 sensors monitoring an area. • There are 225 additional storage nodes. • Information acquired by the sensors should • 1) be available in any neighborhood • 2) easy to compute from storage • 3) be extractable from any 25+ nodes Fig. 1. A sensor network has 25 sensors (big dots) monitoring an area and 225 storage nodes (small dots). A good distributed storage algorithm should enable us to recover the original 25 source packets from any 25+ nodes (e.g., the set of nodes within any one of the three illustrated circular regions).
Introduction • We know how to solve the centralized version of this problem by coding (e.g. Fountain Codes, MDS Codes, linear codes). • Our contribution: Solve the problem in a distributed decentralized random way on a network. • Problem: Find an efficient strategy to • add some redundancy • distribute information randomly through a network • decode easily from (1+ε)k nodes
Network Model • Suppose a network with n storage nodes randomly distributed. • k source nodes have information to be disseminated randomly throughout the network for storage. • Every source node si generates an independent packet. • We will use Fountain codes and random walks to disseminate information from k to n. • The idea is to build a system of n equations in k variables. • For example, y1 = x1 ⊕ x2 ⊕ x3 y2 = x2 ⊕ x3 ⊕ x5 ⊕ xi y3 = x1 ⊕ x3 ⊕ … ⊕ xk ... yn = x4 ⊕ x6 ⊕ xi ⊕ … ⊕ xk • Decode easily from (1+ε)k equations, forε>0.
Fountain codes • Assume k source blocks S = {x1, x2,…,xk}. • Each output block yi is obtained by XORing some source blocks from S. • d(yi) is number of incoming blocks in ith equation, 1 ≦d(yi) ≦k. The Fountain code idea: Choose d(yi) randomly according to a probability distribution such that it is easy to decode from any (1+ε)k output blocks. Easy to decode: Hard to decode: x1x1 ⊕ x2 x1 ⊕ x2x2 ⊕ x3 x1 ⊕ x2 ⊕ x3 x1 ⊕ x4 ⊕ x5 • LT and Raptor codes are two classes of Fountain codes.
Fountain codes • For k source blocks {x1, x2,…,xk} and a probability distributionΩ (d) with 1≦d≦k, a Fountain code with parameters (k, Ω) is a potentially limitless stream of output blocks {y1,y2,…} • Each output block yi is obtained by XORing d randomly and independently chosen source blocks. Figure 1. The encoding operations of Fountain codes: each output is obtained by XORing d source blocks chosen uniformly and independently at random from k source inputs, where d is drawn according to a probability distribution Ω(d).
LT Codes • Definition 2. (Code Degree) For Fountain codes, the number of source blocks used to generate an encoded output y is called the code degree of y, and denoted by dc(y). By constraction, the code degree distributionΩ(d) is the probability distribution of dc(y). • LT (Luby Transform) codes are a special class of Fountain codes which uses Ideal Soliton or Robust Soliton distributions. The Ideal Soliton distribution Ωis(d) for k source blocks is given by • Robust Soliton distribution is a special case of Ideal Solition distribution with some further assumptions.
LT Codes • Lemma 3 (Luby [12]). For LT codes with Robust Soliton distribution, k original source blocks can be recovered from any k + O(√k ln2(k/δ))encoded output blocks with probability 1 − δ. • Both encoding and decoding complexity is O(kln(k/δ)).
LT-Codes Based Distributed Storage (LTCDS) Algorithms • We propose 2 LT-Codes Based Distributed Storage (LTCDS) Algorithms. In both algorithms, the source packets are disseminated throughout the network by simple random walks with Robust Soliton distribution. • i) Algorithm 1, called LTCDS-I, we assume that each node in the network knows the global information k and n. • ii) Algorithm 2, called LTCDS-II, is a fully distributed algorithm and values of n and k are not known. The price we pay is extra transmissions of the source packets to obtain some estimations for n and k.
Previous Work • Previous work focused on techniques based on some pre-assumptions about the network such as geographical locations or routing tables. • Lin el al.[Infocomm07] studied the question ”how to retrieve historical data that the sensors have gathered even if some nodes failed or disappeared?” They proposed two decentralized algorithms using Fountain codes to guarantee the persistence and reliability of cached data on unreliable sensors. But they assume that the maximaun degree of a node is known and the source sends b packets (high complexity). • Dimakis el al.[Infocomm06] used a decentralized implementation of Fountain codes that uses geographic routing and every node has to know its location. They applied their work to grid networks. • Kamara el al.[Sigcomm07] proposed a novel technique called growth codes to increase data persistence, i.e. increasing the amount of information that can be recovered at the sink.
Algorithm 1: Knowing global information k and n • We use simple random walk for each source to disseminate its information. • Each node u that has packets to transmit chooses one node v among its neighbors uniformly independently at random. • We let each node to accept a source packet equiprobability. • Each source packet to visit each node in the network at least once.
Algorithm 1: Knowing global information k and n • The algorithm consists of three phases. • (i) Initialization Phase: • (1) Each node u draws a random number dc(u) according to the distributionΩis(d). Each source node Si,i = 1,..,k generate a header for its source packet xsi and put its ID and a counter c(xsi) = 0. • (2) Each source node si send out its own source packetsi to one of its neighbor u, chosen uniformly at random among all its neighbors N(si). • (3) The node u accepts this source packetsi with probability d/k and updates its storage as
Algorithm 1: Knowing global information k and n • (ii) Encoding Phase: • (1) In each round, when a node u receives at least one source packet before the current round, u forwards the head-of-line (HOL) packet x in its forward queue to one of its random neighbor v. • (2) The node v makes its decisions: • If it is the first time that x visits u, then the node v accepts this source packet with probability d/k and updates its storage as • Else if c(x) < C1nlogn where C1 is a system parameter, then node v puts it into its forward queue and increases the counter of x by one: • If c(x) ≧ C1nlogn then the node v discards packet x forever. • (iii) Storage Phase: • When a node u has made its decisions for all the source packets xs1,xs2,..,xsk , i.e., all these packets have visited the node u at least once, the node u finishes its encoding process and yu is the storage packet of u.
Algorithm 1: Knowing global information k and n • Theorem 7. Suppose sensor networks have n nodes and k sources and the LTCDS-I algorithm uses the Robust Soliton distribution Ωrs. Then, when n and k are sufficient large, the k original source packets can be recovered from any k + O(√k ln2(k/δ)) storage nodes with probability 1 − δ. The decoding complexity is O(kln(k/δ)). • Theorem 8. Denote by the total number of transmissions of the LTCDS-I algorithm, then we have where k is the total number of sources, and n is the total number of nodes in the network.
Algorithm 2: Without knowing global information n and k • In the previous algorithm, values of n and k are known. • We do not assume any thing about the network topology. • Every node does not need to maintain a routing table or knows the maximum degree of a graph. • We design LTCDS-II algorithm, for large values of n and k.
Algorithm 2: Without knowing global information n and k • We design a fully distributed storage algorithm which does not require any global information i.e., values of k and n are not known. • The idea is to utilize simple random walks to do inference to obtain individual estimations of n and k for each node. • We use inter-visit time of random graphs. • Definition 9. (Inter-Visit Time or Return Time) For a random walk on a graph, the inter-visit time of node u, Tvisit(u), is the amount of time between any two consecutive visits of the random walk to node u. This inter-visit time is also called return time. • Our goal is to compute and
Algorithm 2: Without knowing global information n and k • Lemma 10. For a node u with node degree dn(u) in a random geometric graph, the mean inter-visit time is given by where μ is the mean degree of the graph. • The total number of nodes n can be estimated by • However, the mean degree μ is a global information and may be hard to obtain. Thus, with some further approximation and we have
Algorithm 2: Without knowing global information n and k • Definition 11. (Inter-Packet Time) For k random walks on a graph, the inter-packet time of node u, Tpacket(u), is the amount of time between any two consecutive visits of those k random walks to node u. • Lemma 12. For a node u with node degree dn(u) in a random geometric graph with k simple random walks, the mean inter-packet time is given by • An estimation of k can be obtained by • After obtaining estimations for both n and k, we can employ similar techniques used in LTCDS-I to do LT coding and storage.
Algorithm 2: Without knowing global information n and k • The algorithm consists of four phases. • (i)Initialization Phase • (ii)Inference Phase • (iii)Encoding Phase • (iv)Storage Phase
Performance Evaluation • Definition 16. (Successful Decoding Probability) Successful decoding probability Ps is the probability that the k source packets are all recovered from the h querying nodes. • Ps = Ms / M • Definition 15. (Decoding Ratio) Decoding ratio η is the ratio between the number of queried nodes h and the number of sources k, η= h/k
Performance Evaluation • When the decoding ratio is above 2, the successful decoding probability is about 99%. • When the total number of nodes increases but the ratio between k and n and the decoding ratio η are kept as constants, the successful decoding probability Ps increases when η ≥ 1.5 and decreases when η < 1.5. Figure 3. Decoding performance of LTCDSI algorithm with small number of nodes and sources
Performance Evaluation Figure 4. Decoding performance of LTCDS-I algorithm with medium number of nodes and sources
Performance Evaluation • Fixing the ratio between n and k as 10%, k/n=0.1 • As n grows, the successful decoding probability increases until it reaches some platform which is the successful decoding probability of real LT codes. Figure 5. Decoding performance of LTCDS-I algorithm with different number of nodes
Performance Evaluation • Studying values of the constant C1, for C1≧ 3, Ps is almost a constant close to 1. It means after 3nlogn steps, almost all source packets visit each node at least once. • Figure 6. Decoding performance of LTCDS-I algorithm with different system parameter C1
Performance Evaluation • The decoding performance of the LTCDS-II algorithm is a little bit worse than the LTCDS-I algorithm when decoding ratio η is small, and almost the same when η is large. Figure 7. Decoding performance of LTCDSII algorithm with small number of nodes and sources
Performance Evaluation Figure 8. Decoding performance of LTCDS-II algorithm with medium number of nodes and sources
Performance Evaluation • Figure 9. Estimation results in LTCDS-II algorithm with n = 200 nodes and k = 20 sources: (a) estimations of n; (b) estimations of k. • The estimations of k are more accurate and concentrated than the estimations of n.
Performance Evaluation Figure 10. Estimation results in LTCDS-II algorithm with n = 1000 nodes and k = 100 sources: (a) estimations of n; (b) estimations of k.
Performance Evaluation • When C2 is chosen to be small, the performance of the LTCDS-II algorithm is very poor. • This is due to the inaccurate estimations of k and n of each node. • When C2 is large, for example, when C2 ≥ 30, the performance is almost the same. Figure 11. Decoding performance of LTCDS-II algorithm with different system parameter C2
Conclusion • We proposed 2 new decentralized algorithms that utilize Fountain codes and random walks to distribute information sensed by k sensing source nodes to n storage nodes.
References • [1] D. Aldous and J. Fill. Reversible Markov Chains and Random Walks on Graphs. Preprint, available at http://statwww.berkeley.edu/users/aldous/RWG/book.html, 2002. • [6] A. G. Dimakis, V. Prabhakaran, and K. Ramchandran. Distributed fountain codes for networked storage. Acoustics, Speech and Signal Processing, ICASSP 2006, may 2006. • [9] A. Kamra, V.Misra, J. Feldman, and D. Rubenstein. Growth codes: Maximizing sensor network data persistence. In Proc. of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, Sigcomm06, pages 255 – 266, Pisa, Italy, 2006. • [10] Y. Lin, B. Li, , and B. Liang. Differentiated data persistence with priority random linear code. In Proc. of 27th International Conference on Distributed Computing Systems (ICDCS’07), Toronto, Canada, June, 2007. • [11] Y. Lin, B. Liang, and B. Li. Data persistence in large-scale sensor networks with decentralized fountain codes. In Proc. of the 26th IEEE INFOCOM07, Anchorage, Alaska, May 6-12, 2007. • [12] M. Luby. LT codes. In Proc. 43rd Symposium on Foundations of Computer Science (FOCS 2002), 16-19 November 2002, Vancouver, BC, Canada, 2002. • [13] D. S. Lun, N. Ranakar, R. Koetter, M. Medard, E. Ahmed, and H. Lee. Achieving minimum-cost multicast: A decentralized approach based on network coding. In In Proc. The 24th IEEE INFOCOM, volume 3, pages 1607– 1617, March 2005. • [14] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. • [17] S. Ross. Stochastic Processes. Wiley, New York, second edition, 1995.