1 / 36

Hepmark project

Hepmark project. Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it. Computing model. Lab m. Uni x. grid for a regional group. Uni a. CERN Tier 1. Lab a. UK. USA FNAL. Tier-1. Tier3 physics department. France. USA BNL. Uni n. Tier-2. Japan. CERN Tier 0. Italy.

kuniko
Download Presentation

Hepmark project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hepmark project Evaluation of HEP worker nodes Michele Michelotto at pd.infn.it

  2. Computing model Labm Uni x grid for a regional group Unia CERNTier 1 Lab a UK USAFNAL Tier-1 Tier3 physics department France USABNL Unin Tier-2 Japan CERN Tier 0 Italy  Labc Germany Labb grid for a physics study group  Uniy  Unib Desktop michele michelotto - INFN PD

  3. Computing Needs • Tape Storage: • Very Easy: events  Terabyte • Disk Storage • Easy again: events  Terabyte • (1000x1000 or 1024x1024?) • RAID protected or raw size? • Computing Power • Tricky: Event/sec? Sim or Reco? • MIPS, CernUnit, MHz, Spec, SI2K…. michele michelotto - INFN PD

  4. T1 + T2 cpu budget - LHC michele michelotto - INFN PD

  5. FZK Measurement • In 2001 SPEC with gcc was 80% of the average pubblished data • In 2006 the gap was much wider michele michelotto - INFN PD

  6. The SI2K inflaction • The main problems with SI2000 in our community: it is not proportional to HEP codes performance (as it was) • You can buy processors with huge SI2K number but with a smaller increase in real performances michele michelotto - INFN PD

  7. Nominal SI vs real SI • SI2K results for the last generation processor affected by inflation • So CERN (and FZK) started to use a new currency: SI2K measured with “gcc”, the gnu C compiler and using two flavour of optimization • High tuning: gcc –O3 –funroll-loops–march=$ARCH • Low tuning: gcc –O2 –fPIC –pthread michele michelotto - INFN PD

  8. Nominal SI vs real SI • CERN Proposal: Use as site rating the “Real SI” obtained by SI measured with gcc-low and increased by 50% • Actually this make sense only for a short period of time and for the last generation of processor • Run n copies in parallel • Where n is the number of cores in the worker node • To take in account the drop in performance of a multicore machine when fully loaded. michele michelotto - INFN PD

  9. Too many SI2K • Take as an example a worker node with two Intel Woodcrest dual core 5160 at 3.06 GHz • SI2K nominal: 2929 – 3089 (min – max) • SI2K sum on 4 cores: 11716 - 12536 • SI2K gcc-low: 5523 • SI2K gcc-high: 7034 • SI2K gcc-low + 50%: 8284 michele michelotto - INFN PD

  10. Which is the better? • I started to measure performances of HEP codes on several machines • The goal was to find a “commercial mantained” benchmark to replace SI2K • I compared HEP code with • SI2K pubblished results • SI2K measured with gcc and “CERN” tuning • SI2006 and SI2006 rate pubblished results • SI2006 and SI2006 with gcc4 (32 and 64 bit) michele michelotto - INFN PD

  11. CMS sw SIM and Pythia • CMS Montecarlo simulation (32bit) and Pythia (64bit) show the same performance once normalized • Both Specint 2006 pubblished and Specint 2006 with gcc show the same behaviour • SI2K pubbished does not match HEP sw • SI2K cern better but not as good as SI2006 michele michelotto - INFN PD

  12. Babar TierA Results • If you normalize by core and clock all new processors have the same performance • Doubling the older generation cpu • SI2006 matches this pattern (pubblished and gcc ratio constant) • SI2000-cern better than SI2K nominal • SI2000 clearly doesn’t work michele michelotto - INFN PD

  13. Many gaps • Easy to find SPEC pubblished result • But only for new machines • Difficult to measure: • Not easy to have machine on loan from Server reseller or producer • Not easy to borrow machine from colleagues • Always for short periods of time • A SPEC run can last 15-20 hours • Need a set of dedicated worker node to make SPEC and HEP application measurement michele michelotto - INFN PD

  14. Cache • In the 80’s the latency (3-10 clock time) • Now latency is 1000s of clock time • Importance of the cache architecture • 1st level, 2nd level, 3rd level • Cache latency • Cache bandwidth • Shared or exclusive? michele michelotto - INFN PD

  15. 4 core processor michele michelotto - INFN PD

  16. Intel 54xx michele michelotto - INFN PD

  17. AMD 4core michele michelotto - INFN PD

  18. Load transactional Performance don’t drop in the new 4core processor Clovertown drop wrt Harpwertown A dual core processor keeps only up to Load3 michele michelotto - INFN PD

  19. Perf/watt • AMD Barcelona at 65nm Performance per watt similar to INTEL xeon at 45nm michele michelotto - INFN PD

  20. Cache behaviour • 54xx has lower latency even with bigger cache • The 3 processors behave very differently in the 4MB e 64MB range • If your (HEP) application works in this range you will see a big change of performance changing processor michele michelotto - INFN PD

  21. Memory intel vs amd • Access time very similar • At 1GB (tipical footprint of HEP application) the new AMD behave better • But the new are Xeon 54xx much better than the 53xx michele michelotto - INFN PD

  22. Mem intel vs amd • Who is faster? • It depends on the block size • On the red zones Intel is better. • On the green zone AMD is better michele michelotto - INFN PD

  23. Cache behaviour • We need to study the behaviour of tipical HEP application • Simulation, event generation, Reconstruction, Analysis • To understand how to write more efficient application michele michelotto - INFN PD

  24. Power issues • Power consumption change from one processor to another • Clock, High-K dielectric, Active Power Managements, Clock throttling michele michelotto - INFN PD

  25. Power consumption michele michelotto - INFN PD

  26. An HEP data center • Need to make measurement of Power usage for HEP application • Example: a big Tier2 with 500 boxes needs 100kW • Like the whole CED of INFN Padova • About 800 MWh in one year • Energy cost 0.12 Euro per kWh  Energy bills of 100 kEuro/year • A 10% improvement on Power efficiency means 10 kEuro/year savings • And savings on the infrastructure (power distribution, UPS, Cooling) michele michelotto - INFN PD

  27. Power meter • Need a device to measure Voltage and Current • And logging capabilities • E.g. Fluke 1735 michele michelotto - INFN PD

  28. Financial request • Need to buy a new worker node each time a new processor is released in the dual proc market segment • Only if significantly new features are presents • One or two each for INTEL and AMD per year • 4 kEuro each (dual proc, 2GB/core, 1disk) • 2 box to start with michele michelotto - INFN PD

  29. Manpower • Padova: • Michele Michelotto (Primo Tecnologo) 70% • Alberto Crescente (CTER) 30% • Roberto Ferrari (CTER) 30% • Ferrara: • Alberto Gianoli (Primo Tecnologo): 20% • Bologna: • Franco Brasolin (CTER): 20% michele michelotto - INFN PD

  30. Milestone • 2009 • Undestand SPEC 2006. Propose a new benchmark to replace SI2K • Measure the performance of the current architectures for Montecarlo SIM (evt/sec vs SPEC) • 2009/2010 • Power performances • 2010 • Cache profiling michele michelotto - INFN PD

  31. Question? michele michelotto - INFN PD

  32. Backup slides • Backup Slides michele michelotto - INFN PD

  33. SI2K frozen • SI2K is the benchmark used up to now to measure the computing power of all the HEP experiments • Computing power requested by experiment • Computing power provided by a Tier-[0,1,2] • SI2K is the nickname for SPEC CPU Int 2000 benchmark • Came after Spec89, Spec Int 92 and Spec Int 95 • Declared obsolete by SPEC in 2006 • Replaced by SPEC with CPU Int 2006 michele michelotto - INFN PD

  34. Transition problem • Impossible to find SPEC Int 2000 pubblished results for the new processors (e.g. the not so new Clovertown 4-core) • Impossible to find pubblished SPEC Int 2006 for old processor (before 2006) • E.g. Old P4 Xeon, P4, AMD 2xx • You can’t convert from SI2000 to SI2006 but the ratio for x86 architecture is in the 137 – 172 range michele michelotto - INFN PD

  35. Even more • Actually all the gcc results in the previous slide are on i386 (32bit) • if you would like to know how your code is running on 64 bit machine, you can measure Specint INT 2000 with gcc on x86_64. • So the worker node with two Intel Woodcrest dual core 5160 at 3.06 GHz • SI2K nominal: 2929 – 3089 (min – max) • SI2K on 4 cores: 11716 - 12536 • SI2K gcc-low: 6021 • SI2K gcc-high: 6409 • SI2K gcc-low + 50%: 9031 michele michelotto - INFN PD

  36. Atlas • Here 100% is Xeon5160 • Few results for SI2006+gcc but no diff from CMS and babar • Few results also from SI2006 pubblished because of several old architectures • SI2K+gcc not bad • SI2K pubblished heavily overstimate new Xeon • Atlas simulation normalized performs the same on the new intel “core” or amd “opteron” (like CMS, Babar) michele michelotto - INFN PD

More Related