310 likes | 509 Views
Distributed Monte Carlo Instrument Simulations at ISIS. Tom Griffin, ISIS Facility & University of Manchester. Introduction. What is Distributed Computing The software we use VITESS Specifics McStas Specifics Conclusions. What do I mean by ‘Distributed Grid’?.
E N D
Distributed Monte Carlo Instrument Simulations at ISIS Tom Griffin, ISIS Facility & University of Manchester
Introduction • What is Distributed Computing • The software we use • VITESS Specifics • McStas Specifics • Conclusions
What do I mean by ‘Distributed Grid’? • A way of speeding up large, compute intensive tasks • Break large jobs into smaller chunks • Send these chunks out to (distributed) machines • Distributed machines do the work • Collate and merge the results
Spare Cycles Concept • Typical PC usage is about 10% • Most PCs not used at all after 5pm • Even with ‘heavily used’ (Outlook, Word, IE) PCs, the CPU is still grossly underutilised • Everyone wants a fast PC! • Can we use (“steal?”) their unused CPU cycles? • SETI@home, World Community Grid (www.worldcommunitygrid.org)
Possible Software Implementations • Toolkit e.g. COSM • Low level toolkit – source code level integration • So time consuming work, for each application • Entropia DC Grid • Trial run at ISIS two years ago. Some success • Company bought out and in limbo (?) • United Devices Grid MP • What we’re currently using • Quite expensive • Condor • Free (academic research project) • In our experience 2 yrs ago, not reliable with Windows
The United Devices System • Server hardware • We use two, dual Xeon servers + 280 client licenses • Could (will) easily cope with more clients • Software • Servers run RedHat Linux Advanced Server / DB2 • Clients available for Windows, Linux, SPARCs and Macs • Programming • MGSI – Web Services interface – XML, SOAP • Accessed with C++ and Java classes etc • Management Console • Web browser based • Can manage services, jobs, devices etc
Suitable / Unsuitable Applications • CPU Intensive • Low to moderate memory use • Not too much file output • Coarse grained • Command line / batch driven • Licensing issues?
Objects within the Grid • Program • McStas • Job • wish_simulation • Jobstep • Workunit • sent to a Device • Data Set • Data
How to write Grid Programs • Fairly easy to write • Interface to grid via Web Services • So far used: C++, Java, Perl, Fortran, C# • Think about how to split your data and merge results • Wrap and upload your executable • Write the application service • Pre and Post processing • Use the Grid
Wrapping Your Executable • Executable + any dlls etc • Standard data files • Compression • Encryption • Capture screen output • Set Environmental Variables • Command Line
Application Service • Pre-processing • Partition data • Package data partitions • Log in to the Grid server • Create a Job and Job Step • Create a Data Set • Create Datas and upload data packages • Create Workunits • Set the Job running • Post-Processing • Retrieve results • Merge results
Monte Carlo Speed-up Ideas • Two scenarios: • Single large simulation run • Split the neutrons into smaller numbers and execute separately • Merge results in some way • Many smaller runs • Parameter scan
VITESS – Splitting It • Easy mode of operation: fixed executables + data files • Executables held on server • Split command line into bits – divide Ncount • Vary the random seed • Create data packages • Upload data packages
VITESS – Running It • Use GUI to create instrument – Save As Command • “Parameter directory” set to “.” • Submit program parses bat file • Substitutes ‘V’ and ‘P’ • Removes ‘header’ and ‘footer’ • Creates many new bat files with different ‘--Z’s and
C:\My_GRID\VITESSE\VITESSE\build>Vitess-Submit.exe example_job example.bat req_files 20 logging in to https://bruce.nd.rl.ac.uk:18443/mgsi/rpc_soap.fcgi as tom.... Adding Vitesse dataset.... Adding Vitesse datas.... 3e+007 neutrons split into 20 chunks, of -n1500000 neutrons Total number of Vitesse 'runs' = 20 Uploading data for run #1... Uploading data for run #2... . . Uploading data for run #19... Uploading data for run #20... Adding Vitesse datas to system.... Adding job.... Adding jobstep.... Turning on automatic workunit generation.... Closing jobstep.... All done Your job_id is 4878 VITESS – Running It • Submit program creates many bat files
VITESS – Monitoring It • Web Interface
VITESS – Merging It • Download the ‘chunks’ • Merge Data files • DetectedNeutrons.dat : concatenate • vpipes : trajectories & count rate • Two classes of files • 1D - Values: sum & divide by num chunks- • - Errors: square, sum and divide • 2D –Sum / num of chunks
VITESS – Advantages and Problems • Many times faster: linear increase • Needs verification runs (x3) • Typically 11 (potentially) 30+ times faster • 12 hours runs in 1 hour! • Very large simulations reach random limits
VITESS – Some Results 176 hours 59 hours 6hrs 20mins
McStas – Splitting It • Different executable for every run • Executable must be uploaded at run time • Split –n into chunks • or run many instances (parameter scan) • Create data (+ executable) packages • Upload packages
McStas – Running It • Use McGui to create and compile executable • Create input file for Submit program
McStas – Running It • Large run • Submit program breaks up –n##### • Uploads new command line + data + executable • Parameter Scan • Send each run to a separate machine
McStas – Merging It • Many output files Separate merge program • PGPLOT and Matlab implemented • Very similar • PGPLOT • 1D – intensities: sum and divide. Errors: square, sum and divide. Events: Sum • 2D – intensities: sum and divide. Errors: square, sum and divide. Events: Sum • Matlab • 1D – Same maths, different format • 2D – Virtually the same • ‘Metadata’ leave untouched
McStas – Advantages and Problems • Security: Do we trust users? • 100 times faster[?] • Linux version much faster than Windows [?] • How do we merge certain fields? • values = '1.44156e+006 10459.9 30748'; • statistics = 'X0=3.5418; dX=1.52975; Y0=0.000822474; dY=1.0288;'; • Some issue related to randomness of moderator file
Completed Funded Seeking funding Future Developments - Expansion • Expansion • Proposal accepted for an additional 400 licenses • Giving us a total of 480 • Change in licensing model $50k $45k • Bottom Line: Costs • Setup, server licenses, 80 client licenses + support – $18k – CMSD $50k • Total ≈ $250k $83k
Conclusions • Both run well under Grid MP • Submit & Retrieve a few hours work • Merge a bit more • Needs to merge more output formats [?] • Issues with very large simulations • More info on Grid MP at www.ud.com