270 likes | 377 Views
Symmetric Allocations for Distributed Storage. Derek Leong 1 , Alexandros G. Dimakis 2 , Tracey Ho 1 1 California Institute of Technology, USA 2 University of Southern California, USA GLOBECOM 2010 2010-12-09. A Motivating Example.
E N D
Symmetric Allocations for Distributed Storage Derek Leong1, Alexandros G. Dimakis2, Tracey Ho11California Institute of Technology, USA2University of Southern California, USAGLOBECOM 20102010-12-09
A Motivating Example Suppose you have a distributed storage system comprising 5 storage devices (“nodes”)… 1 2 3 4 5
A Motivating Example Each node independently fails with probability 1/3, and survives with probability 2/3… 2 4 1 2 3 4 5 (1/3)2 (2/3)3 ≈ 0.0329218
A Motivating Example Each node independently fails with probability 1/3, and survives with probability 2/3… 2 4 1 3 5 1 3 5 2 4 (1/3)5 ≈ 0.00411523
A Motivating Example You are given a single data object of unit size, and a total storage budget of 7/3 … 1 2 3 4 5
A Motivating Example You can use any coding scheme to store any amountof coded data in each node, as long as the total amountof storage used is at most the given budget 7/3… 1 2 3 4 5
A Motivating Example 1 2 3 4 5 010010101010010101000101010101000101010111010101001001010001010100 01101010001010101110101010010010100010101001 1010010101000101001110 1010010101000101001110
A Motivating Example (1/3)2 (2/3)3 ≈ 0.0329218 01101010001010101110101010010010100010101001 010010101010010101000101010101000101010111010101001001010001010100 1010010101000101001110 1010010101000101001110 1 2 3 4 5 ?
A Motivating Example For maximum reliability, we need to find (1) an optimal allocation of the given budget over the nodes, and (2) an optimal coding scheme that jointly maximize the probability of successful recovery
S A Motivating Example Using an appropriate code, successful recovery occurs whenever the data collector accesses at least a unit amount of data (= size of the original data object) 1 2 3 4 5 t2 t1
A Motivating Example 1 2 3 4 5
A Motivating Example RecoveryProbability for p =2/3 A 7/157/15 7/157/157/15 0.79012 B 7/6 7/60 0 0 0.88889 3 4 5 1 2 C C 2/3 2/3 1/31/31/3 0.90535
Problem Formulation #P-hard to compute for a given allocation and choice of p Given n nodes, access probability p, and total storage budget T, find an optimal allocation (x1;…;xn) that maximizes the probability of successful recovery recovery probability The optimal allocation also tells us whether coding is beneficial for reliable storage budget constraint • Trivial cases of minimum and maximum budgets: • when T= 1, the allocation (1, 0, …, 0) is optimal • when T=n, the allocation (1, 1, …, 1) is optimal
Related Work • Discussion between R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and others at UC Berkeley, 2005 • S. Jain, M. Demmer, R. Patra, K. Fall, “Using redundancy to cope with failures in a delay tolerant network,” SIGCOMM 2005
Symmetric Allocations • We are particularly interested in symmetric allocations because they are easy to describe and implement • Successful recovery for the symmetric allocationoccurs if and only if at least out of them nonempty nodes are accessed • Therefore, the recovery probability of is
Asymptotic Optimality of Max Spreading The symmetric allocation that spreads the budget maximally over all n nodes is asymptotically optimal when the budget T is sufficiently large RESULT 1The gap between the recovery probabilities for anoptimal allocation and for the symmetric allocation is at most . If p and T are fixed such that , then this gap approaches zero as .
Asymptotic Optimality of Max Spreading Proof Idea: Bounding the optimal recovery probability… • By conditioning on the number of accessed nodes r, we can express the probability of successful recovery aswhereSris the number of successful r-subsets • We can in turn bound Srby observing that we have Srinequalitiesof the form , which can be summed up to produce ,where
Asymptotic Optimality of Max Spreading Proof Idea: Bounding the optimal recovery probability… • We therefore have • Applying the boundtoleads to the conclusion that the optimal recovery probability is at most
Asymptotic Optimality of Max Spreading Proof Idea: Bounding the suboptimality gap for max spreading… • The recovery probability of the allocation is • The suboptimality gap for this allocation is therefore at most the difference between the upper bound for the optimal recovery probability and 1, which is • For , we can apply the Chernoff bound to obtain • As , this upper bound approaches zero
Optimal Symmetric Allocation number of nonempty nodes in the symmetric allocation The problem is nontrivial even when restrictedto symmetric allocations…
Optimal Symmetric Allocation Maximal spreading is optimal among symmetric allocations when the budget T is sufficiently large RESULT 2 If , then either or is an optimal symmetric allocation.
Optimal Symmetric Allocation Minimal spreading is optimal among symmetric allocations when the budget T is sufficiently small Coding is unnecessary for such an allocation RESULT 3 If , then is an optimal symmetric allocation.
Optimal Symmetric Allocation Proof Idea: Finding the optimal symmetric allocation… • Observe that we can find an optimal m* from among candidates: • For , where , the recovery probability is • RESULT2 (max spreading optimal) is a sufficient condition on p and Tfor to be nondecreasing in k • To obtain RESULT3 (min spreading optimal) , we first establish asufficient condition on p and T for to be nonincreasing in k; we subsequently expand the condition to include other points for which remains optimal Recall that the recovery probability of thesymmetric allocation is given by … m For constant p and k, is anondecreasing function of m
Optimal Symmetric Allocation maximal spreading is optimal among symmetric allocations other symmetric allocations may be optimal in the gap minimalspreading is optimal among symmetric allocations
Conclusion • The optimal allocation is not necessarily symmetric • However, the symmetric allocation that spreads the budget maximally over all n nodes is asymptotically optimal when the budget is sufficiently large • Furthermore, we are able to specify the optimal symmetric allocation for a wide range of parameter values of p and T