350 likes | 489 Views
OmniStorage: Performance Improvement by Data Management Layer on a Grid RPC System. Yoshihiro Nakajima, Yoshiaki Aida, Mitsuhisa Sato, Osamu Tatebe @University of Tsukuba collaboration work with BitDew team @INRA, Paris-Sud Univ. Outline. Motivation
E N D
OmniStorage:Performance Improvement by Data Management Layer on a Grid RPC System Yoshihiro Nakajima, Yoshiaki Aida, Mitsuhisa Sato, Osamu Tatebe @University of Tsukuba collaboration work with BitDew team@INRA, Paris-Sud Univ.
Outline • Motivation • OmniStorage : a data managementlayer for grid RPC applications • Implementation details • Synthetic grid RPC workload program • Early performance evaluation • Conclusion
Agent invocation communication Internet/Network OmniRPC agent Master rex rex rex Background:GridRPC • Grid RPC: extended RPC system in order to exploit computing resources on the Grid • One of effective preprogramming model for a Grid application • Easy to implement Grid-enabled application • Grid RPC can be applied to Master-Worker programming model. • We have developed OmniRPC[msato03] as a prototype of Grid RPC system • Provides seamless programming environments from local cluster to multi-clusters on a Grid environment. • Main target is Master/Worker type parallel program
Performance issuesin RPC model • RPC mechanism performs a point-to-point communication between a master and a worker • NOT network-topology-aware transmission. • No functionality of direct communication between workers • Issues learned from real grid RPC applications • Case 1: On parametric search type application • Transfers of a large amount of initial data to all workers by RPC parametersThe data transfer from the master would be a bottleneck(O(n) data transfer from a worker are required) • Case 2: On task farming type application • Processing a set of RPCs in a pipeline manner that requires a data transfer between workersTwo more RPCs are required • RPC to send data from a worker to a master • RPC to send data from a master to another worker Introduce a data management layer to solve these issues
Objective • Propose a programming model that decouples data transfer layer from RPC layer • enables to optimize data transfer among a master and workers using several data transfer method • Provides easy-to-use data repository for grid RPC applications • Propose a set of benchmark program according to communication patterns in order to make the common benchmark for similar middleware • To compare performance between OmniStorage and BitDew (by INRIA)
Overview of OmniStorage • Data management layer for grid RPC’s data transfer • decouples data transfer layer from RPC layer • Independent from RPC communication • Enables topology-aware data transfer and optimizes data communications • Transferring data by independent process • Users can make use of OmniStorage with simple APIs • Provides multiple data transfer methods for communication pattern • User can make choice of a suitable data transfer method • Exploits hints information of data communication pattern to be required in applications (BROADCAST, WORKER2WORKER… )
What OmniStorage will solve Goal of this study RPC model Communicationbetween workers Communication between workers Worker A Worker A RPC Master RPC+Data Master Data Worker B RPC Worker B RPC+Data NOT achievable by RPC Achievable by RPC, But NOT efficient Worker A Worker A Broadcasting Broadcasting RPC Data Master Worker B Data Master Worker D RPC Data Worker C Data RPC RPC Worker C RPC+Data Worker D Worker B NOT efficient
An example code of typical OmniRPC application Master program int main(){ double initialdata[1000*1000], output[100][1000]; ... for(i = 0; i < 100; i++){ req[i] = OmniRpcCallAsync("MyProcedure", i, initialdata, output[i]); } OmniRpcWaitAll(100, req); ... } Sending dataas a RPC parameter Worker program (Worker’s IDL) Define MyProcedure(int IN i, double IN initialdata[1000*1000], double OUT output[1000]){ ... /* Worker’s program in C language */ }
An Example code of OmniRPC applicationwith OmniStorage Master program int main(){ double initialdata[1000*1000], output[100][1000]; ... for(i = 0; i < 100; i++){ req[i] = OmniRpcCallAsync("MyProcedure", i, output[i]); } OmniRpcWaitAll(100, req); ... } User write code explicitly OmstPutData(“MyInitialData”, initialdata, 8*1000*1000,OMSTBROADCAST); Hint information ofcommunication pattern Not sending data as a RPC parameter Worker program (Worker’s IDL) Define MyProcedure(int IN i, double OUT output[1000]){ ... /* Worker’s program in C language */ } OmstGetData(“MyInitialData”, initialdata, 8*1000*1000);
Direct communication among workersby using OmniStorage OmniRPC Layer Worker Control sequence Worker Master Worker (“dataB”, 3.1242,…) Worker (“dataA”, “abcdefg”,…) OmstPutData(id,data, hint); Data registration API Data management layer OmniStorage OmstGetData(id,data,hint); Data retrieving API (“dataB”, 3.1242,…) (“dataD”, 321.1,…) (“dataC”, 42.32,…)
Data broadcast from master to workers by using OmniStorage OmniRPC Layer Worker Control sequence Worker Master Worker Worker (“dataA”, “abcdefg”,…) (“dataA”, “abcdefg”,…) Data management layer OmniStorage OmstPutData(id, data,hint); Data registration API OmstGetData(id, data); Data retrieving API (“dataA”, “abcdefg”,…)
Implementations of OmniStorage • Provides three data transfer methods for various data transmission pattern • Omst/Tree • Using our tree-network-topology-aware data transmission implementation • Omst/BT • Using BitTorrent which is designed to large-scale file distribution on widely distributed peers. • Omst/GF • Using Gfarm which is a Grid-enabled distributed file system developed by AIST and Tatebe
Omst/Tree • Only for broadcast communication from master to workers taking tree-topology network into account • Relay node relays the communication between master and workers • User specifies network topology in configuration • Relay node works as a data cache server • Reduces the data transmission where the network bandwidth is lower • Reduces the access requests to the master
Omst/BT • Omst/BT uses BitTorrent as a data transfer method for OmniStorage • BitTorrent: P2P file sharing protocol • Automatically optimizes data transfer among the peers • When # of peers increase, the effectiveness of file distribution gets better • Omst/BT automates the step to use bittorrent protocol
Omst/GF • Omst/GF uses Gfarm file system[Tatebe02] as a data transfer method on OmniStorage • Gfarm is a grid-enabled large-scale distributed file system for data intensive application • Data of OmniStorage are stored/accessed by Gfarm file system • Exploits data replication of Gfarm in order to improve scalability and performance • Gfarm may optimize data transmission
Collaboration with BitDew@LRI • Last visit at LRI • Porting OmniStorage to Grid5000 platform • Creating execution environment for OmniStorage • We discussed common benchmark programs for performance comparison for data repository system such as BitDew and OmniStorage • Current status of synthetic benchmark • OmniStorageThree program are implemented and we got result of performance evaluation • BitDew • Two program are implemented on BitDew • However Gilles FEDAK said that ALL-EXCHANGE benchmark is hard to be implemented on BitDew
Synthetic benchmark programs to make performance comparison of similar data repository middlewares • W-To-W • models a program that an output of a previous RPC becomes the input of the next RPC. • Transfers of one file between one worker to another worker • BROADCAST • models a program to broadcast common initial data from a master to workers • Broadcasts one file from the master to all workers • ALL-EXCHANGE • models a program that every worker exchanges their own data files each other for subsequent processing • Each worker broadcasts its own one file to every other worker
Testbed Configuration Cluster Computer “Dennis” with 8 nodes@hpcc.jp • Dual Xeon 2.4Ghz, 1GB Mem, 1GbE “Alice” with 8 nodes@hpcc.jp • Dual Xeon 2.4Ghz, 1GB Mem, 1GbE “Gfm” with 8 nodes@apgrid.org • Dual Xeon 3.2Ghz, 1GB Mem,1GbE A OmniRPC master program is executed on “cTsukuba” @ Univ. of Tsukuba Two configuration of testbed for performance evaluation • Two clusters connected by high bandwidth network Dennis with 8 nodes + Alice with 8 nodes • Two clusters connected by lower bandwidth networkDennis with 8 nodes + Gfm with 8 nodes
W-To-W: Transfers of one file between one worker to another worker Omst/BT could not achieve better performance than OmniRPC Omst/GF achieves 3x faster than only OmniRPC
BROADCAST: Broadcasts one file from a master to all 16 workers Omst/BT broadcast effectively (5.7x faster) Many communications between master and worker occurred Better performance in case of 1GB data Omst/Tree broadcast effectively (6.7x faster) Big overhead in Omst/BT
ALL-EXCHANGE: Each worker broadcasts its own one file to every other worker (16workers) Omst/GF achieves better performance than others Omst/BT achieves 7x faster than OmniRPC Omst/Gf 21x faster than OmniRPC
Discussion on performance according to basic communication pattern • W-To-W • Omst/GF is preferred • Omst/BT could not get the merit of BitTorrent protocol due to too small number of workers in execution platform • BROADCST • Omst/Tree achieves better performance when the network topology is known • Omst/BT is preferred in case of the network topology is unknown • When more than 1000 workers exists, Omst/BT is suitable • ALL-EXCHANGE • Omst/GF is a better solution • Omst/BT has a chance to improve its performance by tuning BitTorrent’s parameters
Discussion on merit of exploiting hint information of communication pattern • By exploiting some hint information for data • OmniStorage can select an suitable data transfer method depending on data communication patterns • OmniStorage can achieve better data transfer performance • If OmniStorage does not handle hint information • OmniStorage only uses point-to-point communications • OmniStorage is not able to achieve efficient data transfers by process cooperation
Conclusion • We have proposed the new programming model that decouples data transfer from RPC mechanism. • We have designed and implemented OmniStorage as a prototype of data management layer • OmniStorage enables topology-aware effective data transfer. • Characterized OmniStorage’s performance according to data communication patterns.
Future work • NO security concerned • Parameter optimization of BitTorrent on Omst/BT • Performance comparison between OmniStorage and BitDew by using same benchmarks • Benchmarking on more large-scale distributed computing platform such as Grid5000 in France or Intrigger in Japan
Any question? • E-mail • ynaka@hpcs.cs.tsukuba.ac.jp or • omrpc@omni.hpcc.jp • Our website: • HPCS Laboratory in University of Tsukuba • http://www.hpcs.cs.tsukuba.ac.jp/ • OmniStorage will be released soon? • http://www.omni.hpcc.jp/
Case study of performance improvementby OmniStorage on an OmniRPC application • Benchmark program: A program of master / worker type parallel eigen value solver algorithm with OmniRPC(developed by Prof Sakurai@UoTsukuba) • 80 RPCs to be issued • A RPC may take 30 sec. • Initial data size: about 50MB for each RPC • Evaluation details • Since data transmission pattern of initial data is broadcast, we choose Omst/Tree as a data transfer method • We examine application scalability with/without Omst/Tree • We measure the execution time varying # of nodes from 1 to 64.
Performance evaluationby a real application (parallel eigen value solver) Execution time Speed up
Overview of Omst/Tree • Only for broadcast communication from master to workers constructing tree-topology network • Relay node relays the communication between master and workers • Relay node works as a data cache server • Reduces the data transmission where the network bandwidth is lower • Reduces the access requests to the master Worker 1 Master Relay Worker 2 Worker N
Overview of Omst/BT • Omst/BT uses BitTorrent as a data transfer method for OmniStorage • BitTorrent: P2P file sharing protocol • Specialized for sharing large amount of data on large-scale nodes • Automatically optimizes data transfer among the peers • The more # of peer is, the better the effectiveness of file distribution get • Omst/BT automates the way to register “torrent” file. • Basically this way is done by manual
Overview of Omst/GF • Omst/GF uses Gfarm file system[Tatebe02] as a data transfer method for OmniStorage • Gfarm is a grid-enabled large-scale distributed file system for data intensive application • Omst/Gf exploit the data through Gfarm file system • Gfarm hooks the standard system call for file (open, close, write, read). • Gfarm may optimize data transfer Metadata Server Getting file information Application gfmd Gfarm I/O library Remote file access CPU CPU CPU CPU gfsd gfsd gfsd gfsd . . . File system nodes
Point to point communication (One worker sends data to one worker in the another cluster) 1. Clusters connectedhigh bandwidth network 2. Clusters connectedlower bandwidth network Omst/Gf enables direct communication between workers so that its performance is good Omst/Gf is2.5 times fasther than OmniRPC and 6 times faster than Omst/BT
Broadcast from the master to all workers(Master send data to 16 workers) Many communications between master and worker occurred 1. Clusters connectedhigh bandwidth network 2. Clusters connectedlower bandwidth network Each execution time varies widely Big overhead is in Omst/BT but the bigger data is, the better performance Omst/Tree did effective broadcast (2.5 times faster)
All to all communication between 16 workers(Each worker sends own data to other 15 workers) 1. Clusters connectedhigh bandwidth network 2. Clusters connectedlower bandwidth network Omst/Gfis about 2 times faster than Omst/BT