130 likes | 242 Views
Why it might be interesting to look at ARM. Ben Couturier, Vijay Kartik Niko Neufeld , PH - LBC SFT Technical Group Meeting 08/10/2012. The challenge for LHCb. Major upgrade during LS2 Read out detector at bunch- xing rate 40 MHz
E N D
Why it might be interesting to look at ARM Ben Couturier, Vijay Kartik Niko Neufeld, PH-LBC SFT Technical Group Meeting 08/10/2012
The challenge for LHCb • Major upgrade during LS2 • Read out detector at bunch-xing rate 40 MHz • No more hardware based trigger – need to filter 40 Million events / s (32 Tbit/s) in software Why look at ARM? N. Neufeld
Dataflow Detector • GBT: custom radiation- hard link over MMF, 3.2 Gbit/s (about 10000) • Input into DAQ network (10/40 Gigabit Ethernet or FDR IB) (1000 to 4000) • Output from DAQ network into compute unit clusters (100 Gbit Ethernet / EDR IB) (200 to 400 links) Readout Units 100 m rock DAQ network Compute Units Why look at ARM? N. Neufeld
What will be the Compute Unit? • Baseline could possibly be augmented with a co-processor card (like Intel MIC or a GPU) lots of interest from various groups • Alternative 1: Use lower-power, cheaper x86 processors such as Intel Atom, AMD • Optimize HEPSpec/CHF/W • Alternative 2: Or use non-Intel processors. Try to profit from the highly competitive and innovative market for processors for portable devices ARM • A compute unit is a destination for the event-data fragments from the readout units • It assembles the fragments into a complete “event” and runs various selection algorithms on this event • About 0.1 % of events is retained • Baseline option: a high-density server platform (mainboard with standard CPUs) using Moore’s law and some estimates on the algorithms need 4000 to 5000 servers of the 2018 type! Why look at ARM? N. Neufeld
ARM • A “pure” RISC architecture (with some enhancements) • A long tradition in the embedded market • Billions of cores sold • in many variants • # cores / power vs performance • Produced by various licensees • Has a reputation of the best power-efficiency in the market We are here 32-bit IEEE floats SIMD native Javaoffload Announced: 64-bit SIMD with DP floats Why look at ARM? N. Neufeld
So what would a compute unit look like? Why look at ARM? N. Neufeld
Operational constraints • The Online farms are very big • O(2000) servers, of different generations, vendors, • Like a traditional data-centre with all the problems, and very few administrators and some simplifications: • A single client • In Online operation at least mostly a single work-load • But want rack-mountable, remote-manageable, good mechanics, decent powering, vendor support etc… and of course low cost! • Don’t want to build this ourselves needs to fit in traditional data-centre structure Why look at ARM? N. Neufeld
Embedded in the data-centre • Boston Viridis (projects alsofrom DELL and HP) • Consists of 48 SoC • 4 cores 4 GB RAM • ARM A9 Cortex 1.4 GHz • 80 Gb Ethernet switch • Total 192 cores / 192 GB RAM / 300 Watt • Exists also from DELL/HP Why look at ARM? N. Neufeld
How fast is a core? So we’ll need many Why look at ARM? N. Neufeld
Is it worth it? • ARM v7: 192 cores need 300 W and 2 U for about 520 HepSpecs • X5650: 96 hyperthreads need about 1400 W and 2 U for 900 HEPSpecs • If this ratio continues to hold into 2018 LHCb could do the upgrade with a 600 kW data-centre instead of a new (!) 2 MW one • And maybe at some point we need to pay for the power Why look at ARM? N. Neufeld
The acid test • HepSPEC is not necessarily a good test for Online usage • Online we (currently) run n instances of the same application in parallel, where n is the number of cores/hyperthreads • No “mixed” work-load – hyperthreading typically adds more in the Online “mono-culture” • Need to benchmark using the High Level Trigger code Why look at ARM? N. Neufeld
Project: “Moore on ARM” • Need to compile the LHCb software-stack (beginning from Root) • Can compare with natively compiled code – everything works fine on the FC17 test-node, but compilation is slow • Root 5.34.02 ./configure linuxarm --enable-c++11;make –j 4 takes 30m43s • Team (part-time only) Ben Couturier, Vijay Kartik, Niko Neufeld Why look at ARM? N. Neufeld
Future plans • X-compiler chain ready • Will now go on to compile stack • Verification and bench-marking • Then: full-scale test on fully loaded 192 core system (with a faster ARM – currently use A8 – will have A9 or A15), possibly including real network input (for fun) Why look at ARM? N. Neufeld