220 likes | 337 Views
A Dynamic Data Grid Replication Strategy to Minimize the Data Missed. Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama. Agenda . Background &Previous Work Motivation System Models Result Conclusion Future Work. Background.
E N D
A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama
Agenda • Background &Previous Work • Motivation • System Models • Result • Conclusion • Future Work
Background • Large scale geographically distributed systems are becoming more and more popular • Replication of data is the most common solution to improve file access time • Dynamic behavior of Grid users makes it difficult to make decisions concerning data replications to meet the system availability goal
Previous work: • Several replica schemes compared for saving access latency and bandwidth – unlimited storage [Ranganathan, et al. 2002] • HotZone algorithm to minimize the client-to-replica latency [Szymaniak et al. 2005] • HBR - dynamic replica replication strategy to reduce data access time by avoiding networking congestion [Park et al. 2003]
Motivation: • As bandwidth and computing capacity have become relatively cheaper, the data access latency can drop dramatically • System reliability and availability becomes the focus • Any data file access failure can lead to an incorrect result or a job crash • People can tolerate a small delay but not any system unreliability
Motivation: • Replicate data to: • Maximize system data availability • Assume limited storage resources • Without sacrificing data access latency
System Model: • Note that system level data availability is more important than an individual file’s availability • Two new measurements proposed: System File Missing Rate SFMR number of files potentially unavailable number of all the files requested by all the jobs. System Bytes Missing Rate SBMR number of bytes potentially unavailable total number of bytes requested by all jobs.
System Model: • Given a set of jobs, J = (j1, j2, j3…, jN), each job will access one file set F= (f1,f2..fk) • File must stored at a Storage Element (SE) • File availability will depend on the SE availability • For any file, its availability is : pi = 1-
System Model: • SFMR = • SBMR= Job requests can be converted to a series of file access operations
System Model: • SFMR = • SBMR= • The set O means the file accessing set. • We assume the whole storage limit in the whole grid system is S, so we have: • ≤S, Cidenotes the number of copies of fi and S is the total storage available.
System Model: • For each file access operation ri, at instant T, we associate it with an important variable Vi, which will be set to the number of times this file will be accessed in the future. • How to make such a value Vi (4 ways): • No Prediction : The Vi = 1 at any time. • Bio Prediction: Vi is based on the file access history to predict the value of the file by a binomial distribution. • Zipf Prediction: Vi is based on the file access history to predict the value of the file by a Zipf distribution. • Queue Prediction: The current job queue is used to predict the value of the file. If the queue is empty, this Queue Prediction function will work the same as No Prediction.
System Model: • To achieve the optimal the SFMR and SBMR, we have to maximize the following values: and • If the file sizes are the same, SFMR = SBMR. • To better describe our scheme and algorithm, We introduce a weight value as: Wi =(Pj * Vj) /(Cj *Sj)
Algorithm: MinDmr Optimizer (): • if requested file fi exists in the site then continue • if requested file fi does not exist in the site and site has enough free space then retrieve fi from remote site and store it. • if requested file fi does not exist in the site and site does not have enough free space then • sort the files in current SE by the file weight Wi in ascending order. • fetch the files from the sorted file list in order and add it into the candidates list until the accumulative file size of the candidate files are greater than or equal to the requested file. • Replicate the file if the value gained by replicating the file fi > accumulative value loss by deleting the candidate file fj from the SE: ΔPi*Vi > ∑ΔPj*Vj
Simulation Setting • OptorSim : developed by the EU DataGrid Project to test dynamic replica schemes. • Eco optimizer (economical model – file replicated if maximizes profit of SE) • Simulation Configuration : File Set Size : 200 Job Set Size : 10000; File set per job : 3~20 File Size : 1G
Results - SFMR with varying replica optimizers
Results - The Total job time with sequential access SFMR with varying job schedulers
Results – SFMR with varying job queue length Total Job Time with varying job queue length
Results – SFMR with sequential access pattern Missing Rate Gap (SBMR-SFMR)
Conclusion • Proposed two metrics of data availability to evaluate the reliability of the system data in the Data Grid system • Discussed how we model the system availability problem • Developed four prediction-based replica optimizers with the assumption that the Grid storage space is limited • Presented our replica greedy algorithm that treats the hot and cold data file differently and uses a weighting factor for the replacement scheme. • Simulation results indicate our new strategies will outperform all others overall in terms of data availability
Future Work: • When the file size is not unique size, how to enhance our scheme to differentiate the system file missing rate and system bytes missing rate • Work on new measurements to evaluate the job missing rate • Design new scheme and prediction function to minimize the new measurements