1 / 18

Building simple, easy-to-use grids with Styx Grid Services and SSH

Building simple, easy-to-use grids with Styx Grid Services and SSH. Jon Blower, Keith Haines Reading e-Science Centre Environmental Systems Science Centre University of Reading. Motivation.

haruko
Download Presentation

Building simple, easy-to-use grids with Styx Grid Services and SSH

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building simple, easy-to-use grids with Styx Grid Services and SSH Jon Blower, Keith Haines Reading e-Science Centre Environmental Systems Science Centre University of Reading

  2. Motivation • Grid computing is “distributed computing performed transparently across multiple administrative domains” • Implies both ease of use and security • hard to get both simultaneously! • Ease of setup and maintenance also highly desirable • Difficulty in achieving this is major block to uptake of Grid computing • Currently hard for science projects to build their own Grids without very significant technical help • In this talk: • Ease of use (transparency) comes from Styx Grid Services • Security comes from SSH

  3. Running jobs on a Grid • Basic use of a Grid boils down to: • uploading the required input files for a job • running the job • downloading the output • (More advanced use includes workflows, delegation, etc) • Many users ask – why not just use SSH? • File transfer with SFTP/SCP • Execution through SSH exec or SSH login

  4. Advantages of SSH • Trusted and understood by system administrators • Very widely used, bugs get fixed quickly • Lots of implementations and good tools • e.g. WinSCP (Windows Explorer-like interface to remote systems) • Choice of authentication methods including: • password • public-private key pair • pluggable (e.g. GSI-SSH for Globus logins) • Can mount remote filesystems exposed through SSH • sshfs for Linux (analogous to NFS) • SftpDrive for Windows • Hence can work on a remote file without downloading all of it: potentially important in environmental sciences • Can execute remote programs with SSH • Hence SSH can be the nucleus of a simple Grid

  5. What’s wrong with Globus security? • Globus uses X.509 certificates and time-limited proxies • Proxies can be used to temporarily delegate authority to a third party • Certificates have typical life-time of 1 year • High level of security but proven usability problem • Lots of certificate formats – different tools require different formats • Users can’t “remember” the certificate so need to have a copy on every computer they use (or on USB stick or shared disk) • Illegal sharing of certificates: “mine doesn’t work, can I use yours?” • Users known to run SSH as a grid job, then log on to that to get a familiar environment! • Therefore poor usability leads to poor security in practice • and annoys users no end • Conclusion – user certificates should be avoided if possible • MyProxy can help with these problems • cf. NERC DataGrid • but is an extra server to manage

  6. Styx Grid Services • Simple, lightweight system for exposing executables as a service • Executable is installed on a service provider (host) • SGSs are executed just like local programs • myprog –i input.dat –o output.dat • (myprog is a wrapper script that masquerades as the original executable) • files transferred automatically, user doesn’t have to know where • Supports interactive use • including computational steering • But executables exposed through SGS must be non-graphical • “Workflows” can be constructed with shell scripts • data can be streamed directly between the services • extract | process | render • Supported by Taverna • Emphasis on ease of deployment and use, not feature completion

  7. How SGS works • Server contains complete description of executable in XML • includes input and output files, command-line parameters • SGSRun program downloads XML description and parses the command line • Creates new service instance and uploads input files • Starts the service and monitors progress • Uploads stdin and downloads stdout and stderr as the service runs, redirecting them from and to the console • Downloads output files when the service finishes <gridservice name="gulp"> <params> <param type="unflaggedOption" name="inputfile"/> <param type="unflaggedOption" name="outputfile"/> </params> <inputs> <input type="fileFromParam" name="inputfile"/> </inputs> <outputs> <output type="fileFromParam" name="outputfile"/> <output type="stream" name="stderr"/> </outputs> </gridservice>

  8. SGS and security • SGS server can be run in two modes: • Daemon mode: • Standalone server (a container for services) • Traffic optionally encrypted through Secure Sockets (SSL) • Authentication through custom protocol • need to maintain own user database • Jobs run as a generic user • Tunnelled mode: • Server process executed through Secure Shell (SSH) • Client and server communicate down the encrypted channel • Authentication through SSH • No separate user database – just need login on host system • Jobs run with permissions of the specific user • analogous to other systems e.g. Subversion • Client interface is the same in both cases • Choice is purely down to service providers

  9. SGS + SSH = … • You can execute remote jobs with SSH alone, but only stdin, stdout and stderr are communicated down the line • Need to upload and download input and output files "manually" • Styx allows an arbitrary number of channels to be sent down the secure line … • Data streams • Input and output files • Progress and status messages • Steering messages • … through use of the Styx protocol for distributed systems • File-sharing protocol similar to NFS • We have pure-Java implementation of Styx (http://jstyx.sf.net) • Any resource can be represented as a URL: styx+ssh://myhost/myservice/instances/1/outputs/stdout

  10. Demo 1: A basic Grid job • Remote execution of GULP (General Utility Lattice Program) • Julian Gale • Calculates lots of properties of crystal lattices • e.g. Helmholtz free energy • Reads input from stdin, prints output to stdout • gulp < infile • Running remote job exactly the same as running locally • Client-side stub and server-side SGS framework communicate through Styx messages on the secure channel SGS GULP Styx messages exchanged on SSH channel Client GULP stub

  11. Demo 2: Condor job • SGS system can be installed on a Condor submit host • If user specifies a directory of input files instead of a single file, jobs are split across worker nodes in the pool • gulp inputs outputs • One job per file in the inputs directory • SGS system automatically creates Condor submit file and monitors progress • Progress is displayed on the client's console • Easy way to specify parameter sweep jobs, ensemble data processing etc. • Could apply to Sun GridEngine and other DRMs • Interactive use may not be possible depending on DRM Condor worker nodes Condor submit host GULP SGS SSH Client GULP stub

  12. Submission to Globus resources • Two options: • Use GSI-SSH instead of SSH • SSH with Globus authentication • (thanks to CCLRC for Java code to GSI-SSHTerm) • doesn’t quite work yet… ;-) • Submit to Condor-G instead of Condor (right) • OxGrid uses Condor-G to submit jobs to National Grid Service • Very similar to normal Condor operation Globus resources Condor-G Submit host GULP SGS SSH Client GULP stub

  13. Long-running jobs and robustness • Client might disconnect the SSH connection deliberately or accidentally • This might bring down the SGS server process! • Client would not be able to re-connect • (In daemon mode this is less of a problem as the server is persistent) • We have designed but not yet implemented a solution to this • A little coding and a lot of thinking and testing is required! • This is also needed to support workflows properly (services need to connect to one another to transfer data directly) • In progress!

  14. Case study: GCEP project • Grid for Coupled-model Ensemble Prediction • Uses clusters in Reading, British Antarctic Survey and RAL • Run climate models (MPI jobs) then analyse output (single-machine jobs) • Focusses on ensembles, so want to run same program over different input • Scientists write programs in whatever language they like • Deploy on the GCEP servers and create the XML description • Anyone with SSH access to the servers can then run the programs through SGS as if they were local • programs can be run on clusters through Sun Grid Engine • Data transfers happen automatically • MPI jobs on clusters • Trivially parallel jobs on Condor pool of ordinary desktops (Reading Campus Grid)

  15. Limitations • Robustness • Slow data transfers because encrypted • could use alternative transport • There are ways to improve this but need more testing • SGS does not provide a resource broker • But can use Condor-G for this • Users can't (yet) submit arbitrary executables • Complex executables (that spawn other exes) might be hard to deploy in SGS • But we haven't really tried yet • Can't deploy a GUI app as an SGS

  16. Conclusions • To use SGS-SSH all you need is: • An SSH login to the remote system • The SGS software (5MB of pure Java libraries) • Users run Grid jobs securely just like ordinary local programs • Can submit to Condor, Globus and other DRMs • Can create "workflows" of Styx Grid Services with shell scripts • Data can be transferred directly between services • SGS already available: SGS-SSH needs more work • Version 0.2.0 of JStyx downloaded 218 times so far • (most of them probably just want Styx implementation, not SGS  )

  17. Future work • Case studies! • Robustness • Optimize data transfer speed • GridSAM integration (possible) • already has framework for submission to various DRMs • but limited by JSDL limitation of “one job at a time” • Compare with my_condor_submit • From e-Minerals project

  18. Acknowledgements and references • Thanks to… • David Wallom of OERC for helping to integrate with OxGrid • Tom Oinn of Taverna project for Taverna integration • Vita Nuova Holdings Ltd for technical help with Styx protocol • See also… • Reading e-Science Centre booth • Papers in AHM proceedings 2004,5,6 • http://jstyx.sf.net

More Related