250 likes | 470 Views
Distributed Grid Computing at ISIS using the Grid MP System. Tom Griffin, ISIS Facility & University of Manchester / UMIST. What do I mean by ‘Distributed Grid’?. A way of speeding up large, compute intensive tasks Break large jobs into smaller chunks
E N D
Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST
What do I mean by ‘Distributed Grid’? • A way of speeding up large, compute intensive tasks • Break large jobs into smaller chunks • Send these chunks out to (distributed) machines • Distributed machines do the work • Collate and merge the results
Spare Cycles Concept • Typical PC usage is about 10% • Most PCs not used at all after 5pm • Even with ‘heavily used’ (Outlook, Word, IE) PCs, the CPU is still grossly underutilised • Everyone wants a fast PC! • Can we use (“steal?”) their unused CPU cycles? • SETI@home, World Community Grid (www.worldcommunitygrid.org)
Possible Software Implementations • Toolkit e.g. COSM • Low level toolkit – source code level integration • So time consuming work, for each application • Entropia DC Grid • Trial run at ISIS two years ago. Some success • Company bought out and in limbo (?) • United Devices Grid MP • What we’re currently using • Quite expensive • Condor • Free (academic research project) • In our experience 2 yrs ago, not reliable with Windows
The United Devices System • Server hardware • We use two, dual Xeon servers + 280 client licenses • Could (will) easily cope with more clients • Software • Servers run RedHat Linux Advanced Server / DB2 • Clients available for Windows, Linux, SPARCs and Macs • Programming • MGSI – Web Services interface – XML, SOAP • Accessed with C++ and Java classes etc • Management Console • Web browser based • Can manage services, jobs, devices etc
Installing and Deploying the System • Servers • Complete set up in under 3 hours • Virtually self maintaining • Clients • Windows only so far • MSI Installer • approx 20 seconds • SMS • MP Agent User • Install to other OSs looks straightforward
Suitable / Unsuitable Applications • CPU Intensive • Low to moderate memory use • Not too much file output • Coarse grained • Command line / batch driven • Licensing issues?
Objects within the Grid • Program • Job • Jobstep • Data Set • Data • Workunit • Client
How to write Grid Programs • Fairly easy to write • Interface to grid via Web Services • So far used: C++, Java, Perl, C# (any .Net language) • Think about how to split your data and merge results • Wrap and upload your executable • Write the application service • Pre and Post processing • Use the Grid
Wrapping Your Executable • Executable + any dlls etc • Standard data files • Compression • Encryption • Capture screen output • Set Environmental Variables • Command Line
Application Service • Pre-processing • Partition data • Package data partitions • Log in to the Grid server • Create a Job and Job Step • Create a Data Set • Create Datas and upload data packages • Create Workunits • Set the Job running • Post-Processing • Retrieve results • Merge results
Example Application: HMC Hybrid Monte Carlo method of global optimisation to solve molecular crystal structures from powder diffraction data • Parametric problem • e.g. vary parameters such as acceptance ratio, to scan a 3D grid • each run completely independent of any other • Send one run to each machine on the grid
Running HMC on the Grid • Unchanged exe • User edits or creates an appropriate settings file • User runs “my” HMC submit program • Splits bat file into one line per machine • Uploads chunks to the Grid server • Grid server distributes Workunits to clients • User monitors the job with their web browser • Clients return results to the Grid server • User runs HMC retrieve program • Downloads results
More on HMC Submit… • Split the batch file into lines • Create a dataset (to hold our data) • Package data (command line and zmatrix files etc) • Associate data with dataset • Upload data packages to Grid server • Create Workunits from the dataset • Create a Job to hold the Workunits
Yet more… • Program written in C++ • Uses C++ classes to ‘hide’ SOAP calls dsHMC.data_set_gid = mgsi->createDataSet(dsHMC); ud::uuid MgsiClient::createDataSet(const DataSet &data_set) throw(MgsiException) { SOAPMethod request("createDataSet", "urn://ud.com/mgsi"); request.AddParameter("authkey") << authkey; request.AddParameter("data_set") << data_set; const SOAPResponse &response = call(request, const_cast<SOAPParameter *>(&request.GetParameter((size_t)0))); ud::uuid retval; response.GetReturnValue() >> retval; return retval; } • Auto generated by ‘Axis C++’ from WSDL file • Also a C++ HTTPs file transfer program
Performance • Linear: 50 devices ≈ 50 times faster • Affected by size of Workunit • Overhead for distribution is ≈ 1minute • Risk of device being switched off
Example 2: MD Manager • Molecular Dynamics simulation(s) • Program written in C# • Generated from WSDL (and modified) C# classes to hide SOAP • Wrote generic C# HTTP file transfer classes • ‘Interactive’ program • Typical runtime ~10 hours per single simulation • Need to investigate ‘grids’ of simulations
A B C A B C D E F D E F G H I G H I Temperature Pressure • But in 3-dimensions • and with ‘ordering restrictions’ • plus a post processing stage
Who Else Does This? • Johnson & Johnson • Novartis • GSK • National Physical Laboratory • Accelrys • IBM • World Community Grid • http://www.worldcommunitygrid.org/ • Currently the Human Proteome Folding project
Problems Encountered & Support • Technical Problems • Mercifully few! • Main issue has been RAM thresholding (now resolved) • Encryption of certain files causes a problem • Support • So far been very good • Responses to queries always next day (time difference) and always insightful • Ease of setup / maintenance • Installed and fullyrunning in ~3 hours • Next to no maintenance required, other than backup
‘Social’ Issues • Easiest thing to blame • Too abstract for some users (no big box) • Stealing my cycles • Expansion leads to political problems
Completed Funded Seeking funding Future Developments - Expansion • Expansion • Proposal accepted for an additional 400 licenses • Giving us a total of 480 • Change in licensing model $50k $45k • Bottom Line: Costs • Setup, server licenses, 80 client licenses + support – $18k – CMSD $50k • Total ≈ $250k $83k
Summary • Grid is here and running smoothly • Easy to use • Excellent performance • Vast amount of compute power available • Future looks good