1 / 20

Distributed computing at the Facility level: applications and attitudes

Distributed computing at the Facility level: applications and attitudes. Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney. Spare cycles. Typical PC CPU usage is about 10% Usage minimal 5pm – 8am Most desktop PCs are really fast Waste of energy

cachet
Download Presentation

Distributed computing at the Facility level: applications and attitudes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

  2. Spare cycles • Typical PC CPU usage is about 10% • Usage minimal 5pm – 8am • Most desktop PCs are really fast • Waste of energy • How can we use (“steal?”) unused CPU cycles to solve computational problems?

  3. Types of Application • CPU Intensive • Low to moderate memory use • Not too much file output • Coarse grained • Command line / batch driven • Licensing issues?

  4. Distributed computing solutions Lots of choice CONDOR, GridEngine, GridMP… • Grid MP Server hardware • Two, dual Xeon 2.8GHz servers RAID 10 • Software • Servers run RedHat Linux Enterprise Server / DB2 • Unlimited Windows (and other) clients • Programming • Web Services interface – XML, SOAP • Accessed with C++ , Java, C# • Management Console • Web browser based • Can manage services, jobs, devices etc • Large industrial user base • GSK, J&J, Novartis etc.

  5. Installing and Running Grid MP Server Installation 2 hours Client Installation Create MSI and RPM using ‘setmsiprop’ 30 seconds Manual Install Better security on Linux and Macs

  6. Adapting a program for GridMP • Fairly easy to write • Interface to grid via Web Services • C++, Java, C# • Think about how to split your data • Wrap your executable • Write the application service • Pre and Post processing

  7. Package your executable DLLs Standard data files Executable Environment variables } PROGRAM MODULE EXECUTABLE Compress? Encrypt? Uploaded to, and resident on, the server

  8. Create / run a job Proteins Molecules Pkg3 Pkg4 Pkg2 Pkg1 Client side https:// Datasets Create job, generate cross product Server side Workunits Start job

  9. Code examples Mgsi.Job job = new Mgsi.Job(); job.application_gid = app.application_gid; job.description = txtJobName.Text.Trim(); job.state_id = 1; job.job_gid = ud.createJob(auth, job); Mgsi.JobStep js = new Mgsi.JobStep(); js.job_gid = job.job_gid; js.state_id = 1; js.max_concurrent = 1 js.max_errors = 20; js.num_results = 1; js.program_gid = prog.program_gid;

  10. Code examples • Mgsi.DataSet ds =new Mgsi.DataSet(); • ds.job_gid = job.job_gid; • ds.data_set_name = job.description + "_ds_" + DateTime.Now.Ticks; • ds.data_set_gid = ud.createDataSet(auth, ds); • for (int i = 1; i <= numWorkunits.Value; i++) { • FileTransfer.UploadData uploadD = ft.uploadFile(auth, Application.StartupPath + "\\testdata.tar"); • Mgsi.Data data = new Mgsi.Data(); • data.data_set_gid = ds.data_set_gid; • data.index = i; • data.file_hash = uploadD.hash; • data.file_size = long.Parse(uploadD.size); • datas[i - 1] = data; } • ud.createDatas(auth, datas); • ud.createWorkunitsFromDataSetsAsync(auth, js.job_step_gid, new string[] { ds.data_set_gid }, options);

  11. Performance Famotidine form B 13 degrees of freedom P21/c V=1421 Sync data to 1.64A 1 x 107 moves per run, 64 runs Standard DASH 2.4GHz Core2 Quad using single core Gdash submit to test grid of 5 in-use PCs 4 x 2.4GHz Core2 Quad 1 x 2.8GHz Core2 Quad Job complete = 9 hrs Job complete = 24 minutes Speedup = 22.5 x

  12. Performance – 999 SA runs, full grid 4 days 18 hours CPU in ~40 minutes elapsed time 317 cores from 163 devices 42 Athlons: 1.6–2.2Ghz 168 Core 2 duos: 1.8–3 Ghz 36 Core 2 quads: 2.4–2.8 Ghz 1 duron @ 1.2Ghz 42 P4s 2.4–3.6Ghz 27 Xeons: 2.5–3.6Ghz Workunits Time

  13. A Particular Success - McStas HRPD supermirror guide design Complex design Meaningful simulations take a long time Want to try lots of ideas Many runs of >200 CPU days Simpler model was best value Massive improvement in flux Significant cost savings

  14. Problems • McStas • Interactions in the wild • Symantec Anti-Virus • Did not show up in testing • McStas restricted to night running only

  15. User Attitudes • A range • Theft • “I’m not having that on my machine” • First thing to get blamed • Gaining more trust • Evangelism by users

  16. Flexibility with virtualisation • Request to run ‘GARefl’ code • ISIS is Windows based • Few Linux PCs • VMWare server is freeware • 8 Hosts gave 26 cores • More cores = more demand • 56 real cores recruited from servers, 64-core Beowulf • 10 mac cores • Run Linux as a job

  17. Flexibility with virtualisation

  18. The Future Grid growing in power every day New machines added, old ones still left on Electricity Energy saving drive at STFC – switch machines off Wake On-LAN ‘Magic Packets’ + Remote hibernate Laptops Good or bad?

  19. Summary Distributed computing Perfect for coarse-grained,CPU intensive, ‘disk-lite’ Resources Use existing resources. Power increases with time, no need to write-off assets. Scalable Not just faster Allows one to try different scenarios Virtualisation Linux under Windows, Windows under Linux. Green credentials PCs are running anyway, better to utilise them. Can be powered down & up.

  20. Acknowledgements • ISIS Data Analysis Group • Kenneth Shankland • Damian Flannery • STFC FBU IT Service Desk and ISIS Computing Group • Key Users • Richard Ibberson (HRPD) • Stephen Holt (GARefl) • Questions?

More Related