1 / 18

Converting ASGARD into a MC-Farm for Particle Physics

Converting ASGARD into a MC-Farm for Particle Physics. Beowulf-Day 17.01.05 A.Biland IPP/ETHZ. Beowulf Concept. Three Main Components:. Beowulf Concept. Three Main Components:. CPU Nodes. Beowulf Concept. Three Main Components:. CPU Nodes. Network. Beowulf Concept.

avel
Download Presentation

Converting ASGARD into a MC-Farm for Particle Physics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Converting ASGARD into a MC-Farm for Particle Physics Beowulf-Day 17.01.05 A.Biland IPP/ETHZ

  2. Beowulf Concept Three Main Components: Adrian Biland, IPP/ETHZ

  3. Beowulf Concept Three Main Components: CPU Nodes Adrian Biland, IPP/ETHZ

  4. Beowulf Concept Three Main Components: CPU Nodes Network Adrian Biland, IPP/ETHZ

  5. Beowulf Concept Three Main Components: CPU Nodes Network Fileserver Adrian Biland, IPP/ETHZ

  6. Beowulf Concept Three Main Components: CPU Nodes Network Fileserver $$$$$$$$$ ? $$$$ ? $$$ ? How much of the (limited) money to spend for what ?? Adrian Biland, IPP/ETHZ

  7. Beowulf Concept Intended (main) usage : “Eierlegende Woll-Milch-Sau” (one size fits everything) Put ~equal amount of money into each component ==> ok for (almost) any possible use, but waste of money for most applications Adrian Biland, IPP/ETHZ

  8. Beowulf Concept Intended (main) usage : CPU-bound jobs with limited I/O and interCPU communication ~80% ~10% ~10% [ ASGARD , HREIDAR-I ] Adrian Biland, IPP/ETHZ

  9. Beowulf Concept Intended (main) usage : Jobs with high interCPU communication needs: (Parallel Proc.) ~50% ~40% ~10% [ HREIDAR-II ] Adrian Biland, IPP/ETHZ

  10. Beowulf Concept Intended (main) usage : Jobs with high I/O needs or large datasets: (Data Analysis) ~50% ~10% ~40% Adrian Biland, IPP/ETHZ

  11. Fileserver Problems: a) Speed (parallel access) Inexpensive Fileservers reach disk-I/O ~50 MB/s 500 single-CPU jobs ==> 50 MB/s /500 jobs = 100kB/s /job (as an upper limit; typical values reached much smaller) Using several Fileservers in parallel: -- difficult data management (where is which file ?) [ use parallel filesystems ? ] -- hot spots (all jobs want to access same dataset ) [ data replication ==> $$$ ] Adrian Biland, IPP/ETHZ

  12. Fileserver Problems: a) Speed (parallel access) How (not) to read/write the data: Bad: NFS (constant transfer of small chunks of data) ==> always disk repositioning ==> disk-I/O --> 0 (somewhat improved with large cache (>>100MB) in memory if write-cache full: long time to flush to disk ==> server blocks) ~ok: rcp (transfer of large blocks from/to local /scratch ) /scratch rather small on ASGARD if many jobs want to transfer at same time ??? Best: fileserver initiates rpc transfers on request user discipline, not very transparent, … Adrian Biland, IPP/ETHZ

  13. Fileserver Problems: b) Capacity …. 500 jobs producing data Each writes 100kB/s ==> 50 MB/s to Fileserver ==> 4.2 TB / day ! Adrian Biland, IPP/ETHZ

  14. Particle Physics MC Need huge amount of statistically independent events #events >> #CPUs ==> ‘embarassingly parallel’ problem ==> 5x500 MIPS as good as 1x2500 MIPS Usually two sets of programs: Simulation: produce huge, very detailed MC-files (adapted standard packages [GEANT, CORSKA, …] b) Reconstruction: read MC-files, write smaller reco-files selected events, physics data (special SW developed by each experiment) Mass-Production: only reco-files needed: ==> combine both tasks in one job, use /scratch Adrian Biland, IPP/ETHZ

  15. ASGARD Status: 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch Adrian Biland, IPP/ETHZ

  16. ASGARD Status: 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch Needed: Fileserver: ++bandwidth, ++capacity /scratch: guaranteed space/job Adrian Biland, IPP/ETHZ

  17. ASGARD Upgrade: 4x 400GB SATA, RAID-10 (800GB usable) / frame 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 Adrian Biland, IPP/ETHZ

  18. ASGARD Upgrade: 4x 400GB SATA, RAID-10 (800GB usable) / frame 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 Adding 10 Fileservers (~65kFr), ASGARD can serve ~2years as MC Farm and GRID Testbed … Adrian Biland, IPP/ETHZ

More Related