10 likes | 161 Views
Fusion Active Storage for Write-intensive Big Data Applications*. Greg Thorsness , Chao Chen, and Yong Chen Department of Computer Science, Texas Tech University { greg.thorsness , chao.chen , yong.chen } @ ttu.edu. Introduction. Model: Traditional HPC . Evaluations on DISCFarm Cluster.
E N D
Fusion Active Storage for Write-intensive Big Data Applications* Greg Thorsness, Chao Chen, and Yong Chen Department of Computer Science, Texas Tech University {greg.thorsness, chao.chen, yong.chen}@ttu.edu Introduction Model: Traditional HPC Evaluations on DISCFarm Cluster • Data movement is a major bottleneck in data-intensive high performance computing • We propose a Fusion Active Storage System (FASS) to address the data movement bottleneck issue, specifically for write-intensive big data applications • The idea of the proposed FASS is to identify and offload write-intensive operations and carry out these operations on storage nodes • FASS enables to move write-intensive computations to storage nodes, generates and writes data right in place to storage Variable denotations: T – Execution time m – Number of phases Wc- Computational Workload Wd- Data Workload N – Total Nodes b – Bandwidth Model: FASS Variable denotations: Sn– Storage nodes Cn– Computation Nodes Wdi– Instruction Data Workload Mw – Write Intensive Phase Mc– Computational Phase T’ – FASS execution time • We have emulated the FASS system using a 16-node DISCFarm cluster at Texas Tech University to evaluate the benefits. These tests were conducted with a write-intensive random number generator code to measure the potential of the FASS compared to the traditional method • As can be observed from the graph, the traditional method impaired the performance. The FASS achieved over 4 times faster at writing 3 million random numbers than the traditional method Restrictions: N=Sn+Cn; Sn>0; Cn>0; M=Mw+Mc Comparison and Analysis Constants for all test cases: N = 24; b = 24; Wc = 100; m=10; FASS: Components • The Offload Analysis Module (OAM) will calculate if an operation would perform better if it were offloaded • Instruction Decoupling Module (IDM) provides an extended API that would allow the programmer to flag sections of code as write-intensive to determine what to be offloaded • The Kernel Processing Module (KPM) carries out the offloaded instructions on the storage nodes and communications as needed Conclusion • The results of these analyses and evaluations show that the FASS clearly enhances the performance of write-intensive applications • Note that the models and emulations are simplified as they do not take into account factors such as data dependencies. Despite the simplified assumptions, we believe the potential of the FASS is promising and would enhance the performance of real-world write-intensive applications Constants for test 1: Sn = 12 Cn = 12 Wc = 100 Wd = 200 Wdi = 20 In this test we observed the effect of varying the % of write phases in the runtime of the FASS. As can be seen from the graph the FASS performed better than the Traditional method when the % write phases exceeded 60%. The FASS speed the runtime up 1.36X at 100% write phases. Future Work • Build a fully functioning and automated FASS prototype and conduct further evaluations • Examine the viability of using KPM to detect and act upon data dependencies Offload Analysis Module Algorithm We can determine whether to offload the computation or not with a heuristic algorithm References [1] C. Chen, Y. Chen, and P. C. Roth. DOSAS: Mitigating the Resource Contention in Active Storage Systems. In the Proc. of IEEE International Conference on Cluster Computing 2012 (Cluster‘12), 2012. [2] E. J. Felix, K. Fox, K. Regimbal, and J. Nieplocha. Active Storage Processing in a Parallel File System. In 6th LCI International Conference on Linux Clusters: The HPC Revolution, Chapel Hill, North Carolina, 2005. [3] S. W. Son, S. Lang, P. Carns, R. Ross, and R. Thakur. Enabling Active Storage on Parallel I / O Software Stacks. In 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), 2010. Constants for test 2: Sn = 12 Cn = 12 Wc = 100 Wd = 1000 Wdi = 100 Variable Denotations: Wd = data workload Wdi = instruction workload Wc = computation workload N = total Nodes b = bandwidth Cn= Compute node Sn = Storage node In this test we observed the effect of varying the data workload in the runtime of the FASS. As can be seen from the graph the FASS performed better than the Traditional method when the % write phases exceeded 10%. The FASS sped the runtime up 3.66X at 100% write phases. This test performed better than the first test at speeding up execution times, as FASS is specially useful when dealing with a large volume of data If the time saved offloading instructions instead of the entire data is greater than the time lost computing on storage nodes then offload the operations * DISCLAIMER: This material is based upon work supported by the National Science Foundation and the Department of Defense under Grant No. CNS-1263183. An opinions, findings, and conclusions or recommendation expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Department of Defense.