290 likes | 391 Views
Distributed Storage Allocation Problems. Derek Leong, Alexandros G. Dimakis , Tracey Ho California Institute of Technology NetCod 2009 2009-06-16. Motivation. Motivation. 0.1. 2. ?. ?. ?. ?. ?. Σ ≥ 1?. Motivation. A. 1. 1. 0. 0. 0. B. 2 / 5. 2 / 5. 2 / 5. 2 / 5.
E N D
Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16
Motivation 0.1 2 ? ? ? ? ? Σ≥1?
Motivation A 1 1 0 0 0 B 2/5 2/5 2/5 2/5 2/5 C 1/2 1/2 1/2 1/2 0
Motivation A 1 1 0 0 0 Success probability = 0.90× 0.15×0 successful 0-subsets + 0.91× 0.14×2 successful 1-subsets+ 0.92× 0.13×7 successful 2-subsets+ 0.93× 0.12×9 successful 3-subsets+ 0.94× 0.11×5 successful 4-subsets+ 0.95× 0.10×1 successful 5-subsets =0.99
Motivation B 2/5 2/5 2/5 2/5 2/5 Success probability = 0.90× 0.15×0 successful 0-subsets + 0.91× 0.14×0 successful 1-subsets+ 0.92× 0.13×0 successful 2-subsets+ 0.93× 0.12×10 successful 3-subsets+ 0.94× 0.11×5 successful 4-subsets+ 0.95× 0.10×1 successful 5-subsets =0.99144
Motivation C 1/2 1/2 1/2 1/2 0 Success probability = 0.90× 0.15× 0 successful 0-subsets + 0.91× 0.14× 0 successful 1-subsets+ 0.92× 0.13×6 successful 2-subsets+ 0.93× 0.12×10 successful 3-subsets+ 0.94× 0.11×5 successful 4-subsets+ 0.95× 0.10×1 successful 5-subsets =0.9963
Motivation A 0.99 1 1 0 0 0 B 0.99144 2/5 2/5 2/5 2/5 2/5 0.9963 C 1/2 1/2 1/2 1/2 0
Motivation 0.1 2 allocationmodel access model ? ? ? ? ? Σ≥1?
Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? • x • Storage Allocation • Access by the Data Collector • Objective
Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? • x • Storage Allocation • Source s has a data object of unit size • It can use n storage nodes to store x1, x2, …, xn amount of data • But faces an aggregate storage budget T, i.e. • Access by the Data Collector • Objective
Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? • x • Storage Allocation • Access by the Data Collector • Data collector t attempts to recover the data object by accessinga subset r of storage nodes • It succeeds when the total amount of data accessed is at least the size of the data object, i.e. • Objective
Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? • x • Storage Allocation • Access by the Data Collector • Objective • We seek the optimal allocation that maximizes the probability of successful recovery
Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? • x • Difficulty • Problem is nonconvex • Large space of possible symmetric and nonsymmetric allocations(an allocation is symmetric if all its nonzero elements are equal,and nonsymmetric otherwise)
[1] Deterministic Allocation with Probabilistic Access Data collector accesses each storage node independentlywith constant probability p
[1] Deterministic Allocation with Probabilistic Access • Symmetric allocations can be suboptimal • †Given n = 5 storage nodes,budget T = 12/5, and p = 0.9,the nonsymmetric allocationperforms better than the optimal symmetric allocation • Finding the optimal symmetric allocation is also nontrivial †Originally from a discussion among R. Karp, R. Kleinberg, †C. Papadimitriou, E. Friedman, and others†at UC Berkeley
[2] Deterministic Allocation with Fixed Access Data collector accesses an r-subset of storage nodes,selected uniformly at random from the collection of all possible r-subsets, where r<n is a constant
[2] Deterministic Allocation with Fixed Access • Equivalently, we can seek the allocation that minimizes the budget T, among all allocationsthat achieve a given probabilityof successful recovery
[2] Deterministic Allocation with Fixed Access • Example: (n, r) = (6,2) • Question: For any budget T, is therealways a symmetric allocation thatproduces the maximum success probability?
[2] Deterministic Allocation with Fixed Access • Question: What is the optimal symmetric allocation? • For most choices of (n, r, T), theoptimal allocation either concentrates the budget over a minimal number of nodes, or spreads it out maximally • An example of an exception is (n, r, T) = (15, 3, 4.6)for which the optimal number of nodes to use, 9, is neither of the extremes
[2] Deterministic Allocation with Fixed Access • For Probability-1 Recovery, the problem reduces to a simple LP • Result 1:If we require all possible r -subsets to allow successful recovery, then we need a minimum budget ofwhich corresponds to the allocationi.e. it is optimal to spread the budget maximally • We can also bound the success probability above which this allocation is optimal
[3] SymmetricProbabilisticAllocation with Fixed Access Each storage node is used independently with constant probability s/n to store the same amount of data 1/`, andthe total storage used must be at most budget T in expectation
[3] SymmetricProbabilisticAllocation with Fixed Access • Probability of successful recovery can be written aswhere “Bin(n, p)” denotes the binomial random variable with n trials and success probability p • Reparameterizing in terms ofbudget T gives the success probability , each nonempty node stores1/` amount of data ,
[3] SymmetricProbabilisticAllocation with Fixed Access • Result 2: For any r≥ 2, and at any budget T large enough to support a success probabilityxXXxxP(r, T,`)> 0.9for some `, the choice ofx x xxxxxxxx`=ris optimal, i.e. it is best to spread the budget maximally each nonempty node stores1/` amount of data
[3] SymmetricProbabilisticAllocation with Fixed Access • As we increase the budget T, we observe a sharp change in the optimal allocation • For small budgets and thereforelow success probabilities,it is optimal to store the data object in its entirety (`= 1) and hope the data collector accesses at least one of the nonempty nodes • For large budgets and therefore high success probabilities, it is optimal to store only 1/r amount of data in each nodeused (`=r) and hope the data collector accesses r of them r= 5
[3] SymmetricProbabilisticAllocation with Fixed Access • We conjecture that for any r and T, the optimal choice of ` that maximizessuccess probability P (r, T,`) is either `= 1 or`=r r= 5 each nonempty node stores1/` amount of data
[3] SymmetricProbabilisticAllocation with Fixed Access • We conjecture that for any r and T, the optimal choice of ` that maximizessuccess probability P (r, T,`) is either `= 1 or`=r r= 5 increasing budgetper node each nonempty node stores1/` amount of data store less store more
Summary & Future Work [1]Deterministic Allocation with Probabilistic Access • Suboptimality of symmetric allocations [2]Deterministic Allocation with FixedAccess • Optimal allocation for high probability recovery • Extreme point solutions not necessarily optimal for symmetric allocations • Is there always a symmetric optimal allocation? [3]iSymmetricProbabilisticAllocation withFixedAccess • Optimal allocation in high-probability regime • Is there a phase transition in optimal allocationwith increasing budget?
Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16