190 likes | 464 Views
Dynamic Data Grid Replication Strategy based on Internet Hierarchy. Sang Min Park , Jai-Hoon Kim, and Young-Bae Ko Ajou University South Korea. Contents. Introduction to Data Grid Optimizations in Data Grid Novel Replication Strategy based on Internet Hierarchy Simulation
E N D
Dynamic Data Grid Replication Strategy based on Internet Hierarchy Sang Min Park , Jai-Hoon Kim, and Young-Bae Ko Ajou University South Korea
Contents • Introduction to Data Grid • Optimizations in Data Grid • Novel Replication Strategy based on Internet Hierarchy • Simulation • Simulation Results • Conclusions
Introduction to Data Grid • Data Grid Motivations • Petabyte scale data production • Distributed data storage to store parts of data • Distributed computing resources which process the data • Two Most Important Approaches for Data Grid • Secure, reliable, and efficient data transport protocol (ex. GridFTP) • Replication (ex. Replica catalog) • Replication • Large size files are partially replicated among sites • Reduce data access time • Application Scheduling, Dynamic replication issues are emerging
Introduction to Data Grid • Typical Job Execution Scenario
Reducing the Overall Job Execution Time Scheduling Optimization Deciding where to allocate the job Considering location of replicas and computational capabilities of sites Short-term Optimization Deciding from where to fetch replicas Considering available network bandwidth between sites Long-term Optimization (Dynamic Replication Strategy) Shortage of storage in a site Deciding which file should be remaining as a replica Better to replicate popular files because of its future usage Optimizations in Data Grid
Existing Dynamic Replication Strategies • Replica Optimization based on Site-level Locality • Replicate the file that is predicted to be used in future from the perspective of a site • Try to reduce the number of fetch • Delete Oldest, Delete LRU Method • Economic Strategy from European Data Grid • Developing OptorSim –Data Grid Optimization Simulator • Using Auction Protocol to trigger Long-term Optimization • Site-level Locality based on File access patterns
Existing Dynamic Replication Strategies • The Limitations of the site-level optimization • A Site certainly have limitations of their storage size, which means that the rate of data request locality is also limited • There should be predictable file access patterns, but we do not know if there will be.
Replication Strategy based on Bandwidth Hierarchy (BHR) • Network-level Locality • A site is not the only possible source of locality • Another source of locality : Network-level locality • If the replica is located in a close site, not long delay would be taken to fetch this replica Slow Replica Transmission Fast Replica Transmission Network Region (e.g., a country)
Replication Strategy based on Bandwidth Hierarchy (BHR) • Bandwidth Hierarchy
A X Delete this one! Replication Strategy based on Bandwidth Hierarchy (BHR) • Maximizing Network-level locality 1. Avoiding Replica Duplication in a region 2. Considering popularity of file request at the region-level No space here! We should remove some file Replica X is duplicated here! Receiving New Replica X a Site a Site A Region
Simulation • OptorSim • Data Grid Dynamic Replication Simulation tool • Developed as part of European Data Grid Project • Implemented in Java • Implemented Our own Region-based Optimizer in OptorSim
Simulation • Simulation Environment
Simulations General configuration of parameters Bandwidth and Storage Size
Simulation Results Total Job times of three strategies
Simulation Results Total job time with varying bandwidth and storage size
Conclusions • The existing dynamic replication strategies are based only on site-level locality of file request • BHR strategy is based on the network-locality • BHR shows quite good performance when hierarchy of bandwidth clearly appears, and size of storage at a site is small • We extend current site-level replica optimization study to more scalable way
References • William H. Bell, David G. Cameron, Luigi Capozza, A. Paul Millar, Kurt Stockinger, and Floriano Zini.: Simulation of Dynamic Grid Replication Strategies in OptorSim. In Proc. of the 3rd Int'l. IEEE Workshop on Grid Computing (Grid'2002), Baltimore, USA, November 2002. Springer Verlag, Lecture Notes in Computer Science. • William H. Bell, David G. Cameron, Ruben Carvajal-Schiaffino, A. Paul Millar, Kurt Stockinger, and Floriano Zini.: Evaluation of an Economy-Based File Replication Strategy for a Data Grid. In International Workshop on Agent based Cluster and Grid Computing at CCGrid 2003, Tokyo, Japan, May 2003. IEEE Computer Society Press. • Mark Carman, Floriano Zini, Luciano Serafini, and Kurt Stockinger.: Towards an Economy-Based Optimisation of File Access and Replication on a Data Grid. In International Workshop on Agent based Cluster and Grid Computing at International Symposium on Cluster Computing and the Grid (CCGrid'2002), Berlin, Germany, May 2002. IEEE Computer Society Press. • Ann Chervenak, Ian Foster, Carl Kesselman, Charles Salisbury and Steven Tuecke.: The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. Journal of Network and Computer Applications, 23:187-200, 2001. • EU Data Grid Project: http://www.eu-datagrid.org
References • I. Foster, C. Kesselman and S. Tuecke.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 15(3), 2001. • Wolfgang Hoschek, Javier Jaen-Martinez, Asad Samar, Heinz Stockinger and Kurt Stockinger.: Data Management in an International Data Grid Project. 1st IEEE/ACM International Workshop on Grid Computing (Grid'2000), Bangalore, India, Dec 2000. • OptorSim – A Replica Optimizer Simulation: http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.html • Sang-Min Park and Jai-Hoon Kim.: Chameleon: A Resource Scheduler in a Data Grid Environment. 2003 IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'2003), Tokyo, Japan, May 2003. IEEE Computer Society Press. • Kavitha Ranganathan and Ian Foster.: Design and Evaluation of Dynamic Replication Strategies for a High Performance Data Grid. International Conference on Computing in High Energy and Nuclear Physics, Beijing, September 2001. • Kavitha Ranganathan and Ian Foster.: Identifying Dynamic Replication Strategies for a High Performance Data Grid. International Workshop on Grid Computing, Denver, November 2001.