BNL / CASPUR / CERN Price/Performance estimates for some compute platforms (AMD, Intel)

BNL / CASPUR / CERNPrice/Performance estimates for some compute platforms (AMD, Intel) Pavel Nevski BNL /CERN October 2004

Participated: • BNL : P.Nevski • CASPUR : A.Maslennikov(*), M.Rosati • CERN : E.McIntosh, I.McLaren • (*) Project Coordinator P.Nevski - Oct. 2004 - BNL

Goals • We have tried to compare the two most interesting platforms of the day: Intel Nocona and AMD Opteron • In short, we wanted to check which of the two may • deliver more compute power per dollar. P.Nevski - Oct. 2004 - BNL

Hardware under test (CASPUR) • Intel Nocona : 2 CPUs at 3.4 GHz, 1GB RAM • RedHat Enterprise 3 • Kernel: 2.4.21+ • Estimated price (*): 3200 Euro • (1 GB RAM, incl. IPMI card, rack-mounted) • AMD Opteron : 2 CPUs at 2 GHz, 1GB RAM • SuSE Enterprise Edition 9.0 • Kernel: 2.6.5-7+ • SuSE numactl • Estimated price (*): 3200 Euro • (1 GB RAM, incl. IPMI function, rack-mounted) • (*) Prices as calculated by E4 Computer Engineering, Italy P.Nevski - Oct. 2004 - BNL

Benchmark suites • ATLSIM : A full-scale GEANT3 simulation of ATLAS detector (P.Nevski) • (typical LHC Higgs events) • SixTrack : Tracking of two particles in a 6-dimensional phase space • including synchrotron oscillations (F.Schmidt) • (http://frs.home.cern.ch/frs/sixtrack.html) • Sixtrack benchmark code: E.McIntosh • (http://frs.home.cern.ch/frs/Benchmark/benchmark.html) • CERN U : Ancient “CERN Units” Benchmark (E.McIntosh) P.Nevski - Oct. 2004 - BNL

What was measured • On both platforms, we were running one or two simultaneous jobs for each of the benchmarks. • On Opteron, we used the SuSE “numactl” interface • to make sure that at any time each of the two processors makes use of the right bank of memory. • Example of submission, 2 simultaneous jobs: • Intel:./TestJob; ./TestJob • AMD:numactl –cpubind=0 –membind=0 ./TestJob; • numactl –cpubind=1 –membind=1 ./TestJob P.Nevski - Oct. 2004 - BNL

Results • While both machines behave in a similar way when only one job is run, • the situation changes in a visible manner in the case of two jobs. It may take • up to 30% more time to run two simultaneous jobs on Intel, while on AMD • there is a notable absence of any visible performance drop. P.Nevski - Oct. 2004 - BNL

Conclusions • The results we have obtained were predictable. • The Intel platform is a plain SMP architecture, and we are observing an apparent contention effect on the memory bus and/or their single memory controller, when running two simultaneous tasks. • The new AMD platform implements a scalable NUMA architecture and • allows for independent and hence non-interfering CPU-to-memory data exchanges over a pair of memory controllers and Hypertransport buses. • As both machines under test cost practically the same, we conclude • that going Opteron today may save us up to 30% for the same amount • of Teraflops. P.Nevski - Oct. 2004 - BNL

BNL / CASPUR / CERN Price/Performance estimates for some compute platforms (AMD, Intel)