310 likes | 321 Views
Explore the integration of UNICORE and Condor in grid computing for workflow management, job submission, and task execution. Learn about the architecture, TSI interface, implications, and Condor-G overview.
E N D
Bridging Unicore and Condor Hidemoto Nakada National Institute of Advanced Industrial Science and Technology, Japan
Background: The NAREGI project (1) • NAtional REsearch Grid Initiative • Japanese national Grid project, funded by Ministry of Education. • 5 years project, starts April, 2003 • ~17M $ / year • The Goals • To develop a Grid Middleware and Upper Layer • To construct a production Grid for Nano-science simulations • Organization • Corp.: Fujitsu, Hitachi, NEC • Academic: AIST, NII, Titech, Institute for Molecular Science Condor Week @ UW
Background: The NAREGI project (2) • For the first 2 year, employ Condor, UNICORE and Globus as the bases • Construct a Grid testbed using these three • They must be interoperable Condor UNICORE Globus Condor Week @ UW
The UNICORE System • A Grid middleware developed mainly by Fujitsu Lab. Europe • Owned by UNICORE Forum • Designed to utilize several supercomputers installed distributed supercomputer centers • SSH based security model • No PROXY CERT. • Firewall Aware (c.f. Globus) • Connection is one-way: • Can be used from private addressed network • Submit and Disconnect! • Totally in Java (except for small perl scripts) Condor Week @ UW
The UNICORE System • Workflow Management • Everything is a task • Invocation of a executable file • File Transfer: staging in, out • Workflow (task flow) is represented as a Java Object (AJO: Abstract Job Object) • Flow control structures are provided • ‘If’ branch • ‘For’ loop • The workflow graph can be cyclic ! • No scheduler/broker included • NAREGI will provide it Condor Week @ UW
The UNICORE architecture • Gateway • Application level Router • Runs on a Firewall • Relay all communications • SSL based security • NJS (Network Job Supervisor) • Workflow engine • Interpret AJO and execute • TSI (Target System Interface) • Wrap batch sub system • Implemented in Perl Client Firewall Gateway NJS TSI Batch Subsystem Vsite Usite Condor Week @ UW
GUI Client (UNICORE Pro Client) • GUI • Edit work flow • Monitors jobs • Not freely available, right now • Provided by a company called ‘Pallas’ • Will be soon ... Condor Week @ UW
Bridging Condor and UNICORE • UNICORE Condor • Use Condor as local scheduler within a single site. • EASY: just write a TSI perl script. • C.f. Condor Job-manager for Globus • Condor UNICORE • Use Condor as a global scheduler • Unicore serves as local scheduler • NOT SO EASY: have to implement bridging modules • C.f. Condor-G for Globus Condor Week @ UW
Unicore-C: Overview Client Firewall Gateway NJS Condor Pool TSI TSI Condor Submit PBS Vsite Usite Condor Week @ UW
TSI – Target System Interface • Written in Perl • Takes care of • Job invocation • File placement Condor Week @ UW
NJS-TSI interface NJS NJS Script Script TSI TSI Script Condor Submit Submit qsub PBS Condor Condor Week @ UW
Implication of Unicore-C • UNICORE serves as a work flow engine for Condor • C.f. DAGMAN • Users can use GUI to edit Workflow graphs • UNICORE as a submission tool for the Condor • Users can submit jobs from outside of the cite • Can submit from private addressed network • Submit and Disconnect Condor Week @ UW
Condor-U overview • Condor-G • Condor Globus bridge • Replace G (Globus) to U (UNICORE) Condor Week @ UW
Condor-G Overview GLOBUS WORLD Condor Submit Machine GRAM Protocol Schedd GateKeeper User JobManager Grid Manager Globus GAHP Batch System Globus GAHP Server Job Condor Week @ UW
What is the GAHP (Grid ASCII Helper Protocol)? Text based simple protocol Introduced to cope with the Globus module instability Encapsulate Globus module inside the GAHP Server Originally, it was GlobusASCII Helper Protocol Separates Return and Result To enable asynchronous operation ‘Return’ comes immedidate ‘Result’ comes later Grid Manager GAHP Server Request Return Result Condor Week @ UW
Condor-G Overview GLOBUS WORLD Condor Submit Machine GRAM Protocol Schedd GateKeeper User JobManager Grid Manager Globus GAHP Batch System Globus GAHP Server Job Condor Week @ UW
Condor-U Overview UNICORE WORLD Condor Submit Machine UNICORE Protocol Schedd Firewall Gateway NJS User Grid Manager TSI UNICORE GAHP Batch Subsystem Job UNICORE GAHP Server Vsite Usite Condor Week @ UW
Almost the Same! • Can we do it just by re-implementing the GAHP Server? • NO! • The GAHP command set is Globus Specific • Cannot be used for UNICORE Condor Week @ UW
INITIALIZE_FROM_FILE INITIALIZE_FROM_MYPROXY COMMANDS VERSION ASYNC_MODE_ON ASYNC_MODE_OFF QUIT RESULTS GRAM_CALLBACK_ALLOW GRAM_ERROR_STRING GRAM_JOB_REQUEST GRAM_JOB_CANCEL GRAM_JOB_STATUS GRAM_JOB_SIGNAL GRAM_PING GRAM_JOB_CALLBACK_REGISTER GASS_SERVER_INIT REFRESH_PROXY_FROM_FILE MYPROXY_REFRESH MYPROXY_RETRIEVE PROXY_INFO MYPROXY_DESTROY MYPROXY_DELEGATE GAHP command set for Globus Condor Week @ UW
What we did • Redesign the GAHP command set • Simple and Generic as much as possible • Note: The GAHP protocol is not changed • Implement the Grid manager for UNICORE • Can be reused for other systems • Done by Jaime Fry @ Condor Team • Implement the GAHP server for UNICORE Condor Week @ UW
Design principle of the GAHP commands for UNICORE • Simple and Generic • High level commands • Just 4 commands - c.f. 23 commands for Globus • Hide UNICORE specific logic in the GAHP server, not the Grid Manager • So that it can be used for other systems. • Use ClassAd as a command argument and a return value • To ensure the generality of the command set • System specific things are encapsulated in the ClassAd • You can extend the functionality by just extending the ClassAd attribute, without touching the Command set itself Condor Week @ UW
Command set for Unicore GAHP • Job Create • Create a Job • Input: ClassAd • Output:Job Handle • Job Start • Invoke the Job • Input:Job Handle • Job Status • Query the Job status • Input:Job Handle • Output: Status ClassAd • Job Destroy • Destroy information stored in the GAHP server • Input:Job handle GridManager Unicore GAHP Job Create Job Handle Job Start Job Status Running Job Status Complete Job Destroy Condor Week @ UW
ClassAdAttributes (1) generic Condor Week @ UW
ClassAdattributes (2) UNICOREspecific Condor Week @ UW
Submit file sample for Condor-U Universe = globus +SubUniverse = unicore Executable = a.out output = tmpOut error = tmpErr log = tmp.log +UnicoreUsite = fujitsu.com:1234 +UnicoreVsite = NaReGI +KeystoreFile = /home/foo/key +PassphraseFile = /home/foo/passwd Queue Specifies GAHP Server Historical reason Specifies Site will be used To get certificate Condor Week @ UW
Implication of the Condor-U • Condor users can use resources managed by the UNICORE • From out side of the sites • Users can use Condor as a job-scheduler for UNICORE managed resources • Condor GlideIn might be used on it • Communication between nodes have to be assured – it is not common for UNICORE setup Condor Week @ UW
Current Status • UNICORE-C • Done • Will be available soon from our Web site • http://www.naregi.org/ • Condor-U • Under implementation • Will be available by this summer, I hope. Condor Week @ UW
Summary • Condor and UNICORE and Globus are bridged together! • Users can submit jobs from one system to another • The UNICORE GAHP server command set is Generic and Simple • Can be used to bridge to other systems Condor Week @ UW