180 likes | 282 Views
Converting ASGARD into a MC-Farm for Particle Physics. Beowulf-Day 17.01.05 A.Biland IPP/ETHZ. Beowulf Concept. Three Main Components:. Beowulf Concept. Three Main Components:. CPU Nodes. Beowulf Concept. Three Main Components:. CPU Nodes. Network. Beowulf Concept.
E N D
Converting ASGARD into a MC-Farm for Particle Physics Beowulf-Day 17.01.05 A.Biland IPP/ETHZ
Beowulf Concept Three Main Components: Adrian Biland, IPP/ETHZ
Beowulf Concept Three Main Components: CPU Nodes Adrian Biland, IPP/ETHZ
Beowulf Concept Three Main Components: CPU Nodes Network Adrian Biland, IPP/ETHZ
Beowulf Concept Three Main Components: CPU Nodes Network Fileserver Adrian Biland, IPP/ETHZ
Beowulf Concept Three Main Components: CPU Nodes Network Fileserver $$$$$$$$$ ? $$$$ ? $$$ ? How much of the (limited) money to spend for what ?? Adrian Biland, IPP/ETHZ
Beowulf Concept Intended (main) usage : “Eierlegende Woll-Milch-Sau” (one size fits everything) Put ~equal amount of money into each component ==> ok for (almost) any possible use, but waste of money for most applications Adrian Biland, IPP/ETHZ
Beowulf Concept Intended (main) usage : CPU-bound jobs with limited I/O and interCPU communication ~80% ~10% ~10% [ ASGARD , HREIDAR-I ] Adrian Biland, IPP/ETHZ
Beowulf Concept Intended (main) usage : Jobs with high interCPU communication needs: (Parallel Proc.) ~50% ~40% ~10% [ HREIDAR-II ] Adrian Biland, IPP/ETHZ
Beowulf Concept Intended (main) usage : Jobs with high I/O needs or large datasets: (Data Analysis) ~50% ~10% ~40% Adrian Biland, IPP/ETHZ
Fileserver Problems: a) Speed (parallel access) Inexpensive Fileservers reach disk-I/O ~50 MB/s 500 single-CPU jobs ==> 50 MB/s /500 jobs = 100kB/s /job (as an upper limit; typical values reached much smaller) Using several Fileservers in parallel: -- difficult data management (where is which file ?) [ use parallel filesystems ? ] -- hot spots (all jobs want to access same dataset ) [ data replication ==> $$$ ] Adrian Biland, IPP/ETHZ
Fileserver Problems: a) Speed (parallel access) How (not) to read/write the data: Bad: NFS (constant transfer of small chunks of data) ==> always disk repositioning ==> disk-I/O --> 0 (somewhat improved with large cache (>>100MB) in memory if write-cache full: long time to flush to disk ==> server blocks) ~ok: rcp (transfer of large blocks from/to local /scratch ) /scratch rather small on ASGARD if many jobs want to transfer at same time ??? Best: fileserver initiates rpc transfers on request user discipline, not very transparent, … Adrian Biland, IPP/ETHZ
Fileserver Problems: b) Capacity …. 500 jobs producing data Each writes 100kB/s ==> 50 MB/s to Fileserver ==> 4.2 TB / day ! Adrian Biland, IPP/ETHZ
Particle Physics MC Need huge amount of statistically independent events #events >> #CPUs ==> ‘embarassingly parallel’ problem ==> 5x500 MIPS as good as 1x2500 MIPS Usually two sets of programs: Simulation: produce huge, very detailed MC-files (adapted standard packages [GEANT, CORSKA, …] b) Reconstruction: read MC-files, write smaller reco-files selected events, physics data (special SW developed by each experiment) Mass-Production: only reco-files needed: ==> combine both tasks in one job, use /scratch Adrian Biland, IPP/ETHZ
ASGARD Status: 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch Adrian Biland, IPP/ETHZ
ASGARD Status: 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch Needed: Fileserver: ++bandwidth, ++capacity /scratch: guaranteed space/job Adrian Biland, IPP/ETHZ
ASGARD Upgrade: 4x 400GB SATA, RAID-10 (800GB usable) / frame 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 Adrian Biland, IPP/ETHZ
ASGARD Upgrade: 4x 400GB SATA, RAID-10 (800GB usable) / frame 10 frames /home 24 nodes/ frame /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 Adding 10 Fileservers (~65kFr), ASGARD can serve ~2years as MC Farm and GRID Testbed … Adrian Biland, IPP/ETHZ