240 likes | 255 Views
ATLAS@home. Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others. 2020/1/3. ATLAS : Elementary Particle Physics. One of the biggest experiment at CERN trying to understand the origin of mass which completes the standard model
E N D
ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others 2020/1/3
ATLAS:Elementary Particle Physics • One of the biggest experiment at CERN • trying to understand the origin of mass which completes the standard model • 2012,ATLAS and CMS discovered Higgs Boson
data processing flow in ATLAS 2020/1/3
Why ATLAS@home • It's free! Well, almost. • Public outreach – volunteers want to know more about the project they participate • Good for ATLAS visibility • Can add a significant computing power to WLCG • A brief history • Started end of 2013, at a test instance at IHEP, Beijing • Migrated to CERN and officially launched June 2014 • are continuously running. 2020/1/3
ATLAS@home • Goal: to run ATLAS simulation jobs on volunteer computers. • Challenges: • Big ATLAS software base, ~10GB, and very platform dependant , runs on Scientific Linux • Volunteer computing resources, should be integrated into the current Grid Computing infrastructure. In other words, all the volunteer computers should appear as a WLCG site, and Jobs are submited from PanDA(ATLAS Grid Computing Portal). • Grid Computing relies heavily on personal credentials, butthese credential should not be put on volunteer computers
Use VirtualBox+vmwrapper to virtualize volunteer hosts Use network file system CVMFS to distribute ATLAS software, as CVMFS supports on-demand file caching, it helps to reduce the image size. In order to avoid placing credential on the volunteer hosts, Arc CE is introduced in the architecture together with BOINC Arc CE is grid middleware, it interacts with ATLAS Central Grid Services, and manages different LRMS (Local Resource Management System), such as Condor, PBS by specific LRMS plugins A BOINC plugin is developped, to forward “Grid Jobs” to the BOINC server, and convert the job results into Grid format. Solutions
Architecture ATLAS Workload Management System 2020/1/3
BOINC ARC plugin(1) • Converts a ARC CE job into a BOINC job • The Plugin includes: • Submit/scan/cancel job • Information provider (total CPUs, CPU usages, job status) • Submit • ARC CE job: All input files into one tar.gz file • Copy the input file from ARC CE session directory into BOINC internal directory • Setup BOINC environment and call BOINC command to generate a job based on job templates/input files • Wrote the jobid back to ARC CE job control directory. • Upon job finishing, BOINC services put the desired output files back to the ARC CE session directory
BOINC ARC CE plugin(2) • Scan • Scan the job diag file (in session directory), get the exit code, upload output files to designated SE, update ARC CE job status. • Cancel • Cancel a BOINC job • Information provider • Query BOINC DB, get information concerning total CPU number, CPU usage, status of each job
Current Status gained CPU hours: 103,355 daily resource: 3% of grid computing
ATLAS jobs • Full ATLAS simulation jobs • 10 evts/job initially • Now 100 evts/job • A typical ATLAS simulation job • 40~80MB Input data • 10~30MB output data • on average, 92 minutes CPU time, 114 minutes elapsed time • CPU efficiency lower than on grid • Slow home network → significant • initialization time • CPUs not available all the time • Jobs run in an SLC5 64-bit->upgraded to SLC6 (Ucernvm) • virtualization on Windows, Linux, Mac • ANY kind of job could run onATLAS@HOME 2020/1/3
How Grid People see ATLAS@home • Volunteers want to earn the credits for their contribution, they want their PCs to work optimally • This is true for the grid sites as well, at least it should be • But volunteers are better shifters then we are • Different to what we are used to: • On grid: jobs are failing, please fix the sites! • On Boinc: jobs suck, please fix your code! • ATLAS@HOME is the first Boinc project massive I/O demands, even for less intensive jobs • Server infrastructure needs to be carefully planned to cope with a high load Credentials must not be passed to PCs • Jobs can be in the execution mode for a long time, depending on the volunteer computer preferences, not suitable for high priority tasks 2020/1/3
ATLAS outreach atlas-comp-contact-home@cern.ch • outreach website: https://atlasphysathome.web.cern.ch/ • feedback mail list:
Future Effort (1) • Customize the VM image to reduce the network traffic and speed up the initialization • Optimize the file transfers, server load and job efficiency on the PCs • Test and migrate to LHC@home infrastructure • Test if BOINC can replace the small Grid Sites • Investigation of the use of BOINC on local batch clusters to run ATLAS jobs. • Investigation of running various worflows (longer jobs, multi-core jobs) on virtual machines 2020/1/3
Future Effort(2) • provide an event display & possibly screen saver that would let people see what they are running.
Acknowledgements • David and Rom for all the supports and suggestions. • CERN IT for providing Servers and Storage resources for ATLAS@home, working on integrating ATLAS@home with LHC@home