160 likes | 295 Views
Extending ProActive for QosCosGrid : Support for Advance Reservation and Multi-Cluster Allocation. Kryzysztof Kurowski and Mariusz Mamonski ( Poznan ), Gabor Szemes and George Kampis , ( Collegium Budapest ) Walter De Back and Lazlo Gulyas ( Aitia )
E N D
Extending ProActive for QosCosGrid: Support for Advance Reservation and Multi-Cluster Allocation KryzysztofKurowski and MariuszMamonski (Poznan), Gabor Szemes and George Kampis, (Collegium Budapest) Walter De Back and Lazlo Gulyas (Aitia) Christian Delbé (ActiveEon)
QosCosGrid Project • EU 6th Framework Programme STREP Project • 2,5 years, ends in 03/2009 • 11 partners (2 private companies) from 10 countries • Aim at providing Quasi-OpportunisitcSupercompting for COmplexSystems on GRIDs • Quasi (i.e. not really) opportunistic • Reservations, predictable performances • Framework for Complex systems on Grids • Very broad application class with widely varying requirements (no implicit restrictions on applications) • For the Grids…
QosCosGrid Project Status • GRMS Grid Scheduler • Reservation and orchestration of resources • Specific XML job description (Job Profile) • Resources needs, processes affinity,… Job Profile GRMS Portal OpenDSP/LSF OpenDSP/LSF … Cluster 1 Cluster N
QosCosGrid Project Status • GRMS Grid Scheduler • Reservation and orchestration of resources • Specific XML job description (Job Profile) • Resources needs, processes affinity,… • Programming Framework • Fault-tolerant cluster-to-cluster message passing libraries based on Open MPI (FORTRAN/C/C++) and ProActive (Java) • 9 Use Cases • Written in C/MPI and in Java/ProActive • Benchmarked on multi-clusters testbed
QosCosGrid/ProActive challenge 1 : Deployment • Preserve ProActive deployment properties (Nodes, Virtual Nodes,…) • Provide end-users JobProfile as a single description (No explicit deployment descriptor) • Avoid need for direct connection on remote clusters machine (ssh,…) • Provides a GRMS deployment process ? Unfortunalty … • The main process must be connected to deployed processes during the execution • Provides a 2 steps submission, i.e. submit the main process that will submit rest of the application ? Unfortunalty … • Sub-jobs are not supported by GRMS (reservation and accounting) • Submit ProActive application as a whole with a specific asynchronous deployment process
Deployment for QosCosGrid : ProActive Node Coordinator Job Profile • Submit Job Profile • Create reservation and submit QCG-PA Wrappers • Start QCG-PA Wrappers • Main is started and registered to the PNC • Runtimes are started and registered to the PNC GRMS Portal OpenDSP/LSF OpenDSP/LSF WRAP main WRAP rt WRAP rt … WRAP rt ProActive Node Coordinator
Deployment for QosCosGrid : ProActive Node Coordinator • Submit Job Profile • Create reservation and submit QCG-PA Wrappers • Start QCG-PA Wrappers • Main is started and registered to the PNC • Runtimes are started and registered to the PNC GRMS Portal OpenDSP/LSF OpenDSP/LSF main rt rt … rt ProActive Node Coordinator
QosCosGrid/ProActive challenge 2 : Connectivity • QCG is multi-clusters without restriction on applications • Communication must be possible from anywhere to anywhere • But clusters are usually behind a firewalland/or a NAT • Use ProActive’s RMISSH on port 22. Unfortunatly… • Does not deal with NATs • Provide a new protocol that supports NATs : RMIQCG
Extending inter-cluster communications for QosCosGrid • RMIQCG uses SOCKS protocol instead of SSH • SOCKS server deployed on the front-end node • One port must be externally available • Single proxy per cluster implies contention…
Benchmarks Testbed • Testbed portal @ node2.qoscosgrid.man.poznan.pl/gridsphere/gridesphere
UseCase9DistributedMultiAgent Simulation • Cellular Automata 1. Partition 2. Deploy Network Communcation 3. Iterate Active Object
UseCase9DistributedMultiAgent Simulation • On 8 machines (8 and 4+4) Scalability issue with RMIQCG
Conclusion • Important external contribution • Dedicated QCGProActive version (based on 3.9) • Ongoing integration in official ProActive 4 • Provides solutions to scalability problems inherent to RMIQCG • Similar solutions are studied in the OASIS team • Successful partnership between QosCosGrid and ActiveEon • Support for QCGProActive deployment design • Support for upgrade from 3.2 to 3.9 • Support for use cases application and design
QosCosGrid Project • EU 6th Framework Programme STREP Project • 2,5 years, ends in 03/2009 • 2 800 000 Euro • Strong QCG Consortium: 11 partners (2 private companies) from 10 countries
UseCase8 Network Communcation Active Object Active Object