New HPC architectures landscape and impact on code developments

New HPC architectureslandscapeand impact on code developments Carlo Cavazzoni Cineca & MaX

Enabling Exascale Transition • GOAL: “modernize” community codes and make them ready to exploit at best future exascale system for material science simulations (MAX Obj1.4) • CHALLENGE: there is not yet a solution that fits all needs, and this is in common in all computational science domains • STRATEGY: pragmatic approach based on building knowledge about exascale related problems and running proof of concepts to field test solutions, and finally deriving best practices that can consolidate in real solutions for the full applications and making their way through the public code release. • OUTCOME: New code versions with development validated, libraries and module publicly available beyond MAX, extensive dissemination activities . • IMPACT: modern codes, exploitation of today HPC systems, other applications fields as well as technology providers.

Changes in the road-map to Exa Intel’s Data Center Group GM Trish Damkroger describing the company’s exascale strategy and other topics they are talking about at the SC17 conference, she offhandedly mentioned that the Knights Hill product is dead. More specifically she said that the chip will be replaced in favor of “a new platform and new microarchitecture specifically designed for exascale.”

Specialized cores

ExascaleHow serious the situation is? Peak Performance 10^5 FPUs in 10^4 servers Moore law 10^18 Flops Number of FPUs FPU Performance 10^4 FPUs in 10^5 servers 10^9 10^9 Flops Dennardlaw Working hypothesis ExascaleArchitectures Heterogeneus

General Consideration • Exascale is not (only) about scalability and Flops performance! • In an exascale machine there will be 10^9 FPUs, bring data in and out will be the main challenge. • 10^4 nodes, but 10^5 FPUs inside the nodes! • There is no silver bullet (so far) • heterogeneity is here to stay • deeper memory hierarchies Carlo Cavazzoni

Exascale… some guess • From GPU to specialized core (tensor core) • Specialized memory module HBM • Specialized non volatile memory NVRAM Performance modelling • Refactor code to better fit architectures with specialized HW • Avoiding WRONG TURN!

Paradigm and co-design Identify latency and throughput sub/module/class Map to HW App workflow latency Re-factor throughput knl Latency code Throughput code host Map and Overlap comm latency throughput Heterogeneus

MaXActivities Programming Paradigms Libraries Co-design Profiling Hot spots Performance issues bottlenecks Flops and watts efficiency DSL Kernel Libraries module Perf. Models New arch. Vendors New MPI and OpenMP Standards New Para. OmpSS, CUDA New more efficient code version Library/module shared Codes/community/vendor Feedback to Scientists/developers (WP1) Dissemination of best practice Schools and Workshops Collaborations e.g. CoE/FET/PRACE

Perf. modelling, results Absolute time estimate results. MnSi - bulk, 64 atoms, 14 k-points

CoE what is my target architecture? How could I cope with GPUs, many-cores, FPGAs? I like homogeneous architecture! Why should I care about heterogeneous? DSL, Kernel libraries Modularization, API Encapsulation Separation Of Concern Heterogeneity is here to stay!!! DSL Sirius CheSS (SIESTA) SDDK FFTXlib (QE,YAMBO) LAXlib (QE, ~YAMBO) FLEUR-LA (FLEUR) Kernel lib ELPA (QE,YAMBO, FLEUR) Carlo Cavazzoni

One size do not fit all 109 FPU to leverage Best algo for 1FPU /= best algo for 109FPU Implement the best algo for each scale e.g. 2 FFT and data distribution in QE 6.2 Autotuning • Choose the best at runtime

Beyond Modularization QE - Libraries FFTXlib LAXlib Other codes Mini-app

Scaling-out YAMBO A single GW calculation has run on 1000 Intel Knights Landing (KNL) nodes of the new Tier-0 MARCONI KNL partition, corresponding to 68000 cores and ~ 3 pFlop/second. The simulation, related to the growth of complex graphene nanoribbons on a metal surface, is part of an active research project combining computational spectroscopy with cutting edge experimental data from teams in Austria, Italy, and Switzerland. Simulations were performed exploiting computational resources granted by PRACE (via call 14). http://www.max-centre.eu/2017/04/19/a-new-scalability-record-in-a-materials-science-application/

Planning for Exascale Performance model, co-design, POC code re-factoring Pre-exascale exascale today Yambo @ 3Pflops Yambo @ 10-15Pflops Yambo @ 50Pflops Socket perf 3-5TFlops Socket perf 10-15TFlops Socket perf 20-40TFlops

Marconi convergent HPC solution Cloud/Data Proc. Scale Out 792 Lenovo NeXtScale servers Intel E5-2697 v4 Broadwell - 216 nodi eth x cloud HT INFN - 216 nodi eth x cloud HPC/DP - 360 nodi QDR x Tier 1 – HPC 100 Lenovo NeXtscale servers Intel E5-2630 v3 Haswell QDR + Nvidia K80 2300 Lenovo Stark servers > 7PFlops Intel SkyLake 24 cores @ 2.1GHz. 196GByte x node 3600 Intel/ lenovo servers > 11PFlops Intel PHI code name Knight Landing 68 cores @ 1.4GHz. single socket node: 96GByte DDR4 + 16GByte MCDRAM 720 Lenovo NeXtScale servers Intel E5-2697 v4 Broadwell 18 cores @ 2.3GHz. 128GByte x node Lenovo GSS + SFA12K + IBM Flash >30PByte

Cineca “sustainable” roadmaptoward exascale 5x >250PF+ >20PF ~8MW 5x 50PF+ 10PF ~4MW 2x (latency cores) 11PF+ 9PF 3.5MW 5x 1x (latency cores) 2PF 1MW 20x 10x (in total) 100TF 1MW Paradigm change Pre-exascale solid Carlo Cavazzoni

What does 5x really means? Peak Performance? Linpack? HPCG? Time to solutions? Energy to solutions? Time to Science? A combination of all of the above? We need to define the right metric!

The data centers at the Science Park ECMWF DC maincharacteristics • 2 power line up to 10 MW (onebck up of the other) • Expansion to 20 MW • Photovoltaiccells on the roofs (500 MWh/year) • Redundancy N+1 (mechanics and electrical) • 5 x 2 MW DRUPS • Cooling • 4 dry coolers (1850 kW each) • 4 groundwaterwelles • 5 refrigeratorunits (1400 kW each) • Peak PUE 1.35 / Maximum annualized PUE 1.18 Electricalsubstation (HV/MV) Outdoor Chillers + mechanics Diesel Generators ECMWF PLANTS ECMWF DC 1 ECMWF DC 2 INFN DC CINECA DC ECMWF EXP. ECMWF PLANTS INFN – CINECA DC maincharacteristics • up to 20 MW (onebck up of the other) • Possible use of CombinedHeat and PowerFuelCells Technology • Redundancystrategy under study • Cooling, still under study • dry coolers • groundwaterwelles • refrigeratorunits • PUE < 1.2 – 1.3 Outdoor Chillers Electricalplant rooms DRUPS rooms Mechanicalplant rooms POP 1 + POP 2 Switch rooms Gas storage rooms General Utilities

HPC and Verticals Value delivered to users VALUE Applications integration (Meteo, Astro, Materials, Visit, Repo, Ing. , Analytic, etc…) BigDATA Accelerated computing codesign 3D Viz Cloud Service AI HW infrastructure (clusters, storage, network, devices) Toward an End-to-end optimized infrastructure

Thank you!

Backup slides

Exascale “node”, according to Intel https://www.hpcwire.com/2018/01/25/hpc-ai-two-communities-future/

Memory! Analysis of Memory allocation, During an SCF cycle. Memory BW usage On different type of MEM! Critical behaviour Code is slowed down Need better memory access pattern communications

Exascale system Al Gara’svision for the unification of the “3 Pillars” of HPC currentlyunderway. “The convergence of AI, data analytics and traditionalsimulationwillresult in systems with broadercapabilities and configurabilityaswellas cross pollination.”

RMA MPI Intel 2017 • RMA as a substitute of Alltoall • Source code and data shared with Intel BDW36 mpi

QE: Linear Algebra on KNLwith OpenMP

The CINECA-INFN Plan

Exascale How serious the situation is? Peak Performance 10^5 FPUs in 10^4 servers Moore law 10^18 Flops Number of FPUs FPU Performance 10^4 FPUs in 10^5 servers 10^9 10^9 Flops Dennard law Working hypothesis Exascale Architectures Heterogeneus

Exascale… some guess • From GPU to specialized core (tensor core) • Specialized memory module HBM • Specialized non volatile memory NVRAM Performance modelling • Refactor code to better fit architectures with specialized HW

Bologna Big Data Science Park Protezione civile and regional agency for development and innovation CINECA & INFN Exascale Supercomputing center ECMWF Data Center Conference and Education Center BIG DATA FOUNDATION Agenzia Nazionale Meteo University centers «Ballette innovation and creativity center» IOR biobank Enea center Competence center Industry 4.0

New Cineca HPC infrastructure design point D.A.V.I.D.E. (prototype) Marconi A4 - OPA Marconi A3 - OPA Marconi A2 - OPA Marconi A1 - OPA GSS OPA GSS Login Gateway PPI4HPC & EuroHPC Internet HBP, Eurofusion PRACE/EUDAT CNAF ViZ ETH-core + Mellanox Gateway IB + ETH-25/100Gbit tape Fibre SW Ex-PICO 5100 NFS Servers Cloud, BigData, AI, Interactive and Data processing Cluster Mellanox FDR servers FEC Servers TMS

Al Gara (Intel) the samearchitecturewill cover HPC, AI, and Data Analytics throughconfiguration, whichmeansthereneeds to be a consistent software story acrossthesedifferent hardware backends to address HPC plus AI workloads.

Exascale system Al Gara’svision for the unification of the “3 Pillars” of HPC currentlyunderway. “The convergence of AI, data analytics and traditionalsimulationwillresult in systems with broadercapabilities and configurabilityaswellas cross pollination.”

Exascale “node”, according to Intel https://www.hpcwire.com/2018/01/25/hpc-ai-two-communities-future/

D.A.V.I.D.E. Intelligenza Artificiale: Dall'Università alle Aziende - Bologna http://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-datasheet.pdf

HPC and Verticals Value delivered to users VALUE Applications integration (Meteo, Astro, Materials, Visit, Repo, Eng. , Analytic, etc…) BigDATA Accelerated computing AI Co-design 3D Viz High Through Connectors to other infrastructures procurement HW infrastructure (clusters, storage, network, devices) Toward an End-to-end optimized infrastructure

Power projection Peek Perf (DP) @ 10MW

Technical Project Goal of the procurement • New Prace Tier-0 system • Target:5x increase of system capability • Maximize efficiency (capability/w) • Sustain production for 3 years minimum • Integrated in the current infrastructure • Possibly hosted in the same data center as ECMWF 40

New HPC architectures landscape and impact on code developments

New HPC architectures landscape and impact on code developments

Presentation Transcript

Opportunities for Biological Consortia on HPC x Code Capabilities and Performance

ROADS AND NEW DEVELOPMENTS

Evolving Sanctions Landscape and its impact on AML Compliance

Agriculture: It’s Impact on the Cultural Landscape

How internal and external forces impact on landscape

Health Care Law Developments and the Impact on Public Employers

Dana Lattibeaudiere February 7, 2008 New Code Developments

Trust developments which impact on DLM

Performance Characteristics of a Cosmology Package on Leading HPC Architectures

Impact of trail network on landscape

Update on GEF Policies and New Developments

HPC in the Cloud Impact on Future Enterprise Architectures

C++ on Next-Gen Consoles: Effective Code for New Architectures

JnNURM Impact on the urban landscape

New Developments

Impact of HPC on Chemistry and Environment

Phenote: new developments and new communities

Update on GEF Policies and New Developments

Cyprus: Practical application of the new developments and its impact on tax structuring

Phenote: new developments and new communities

Dairy Alternatives Market Trends, New Developments And Competitive Landscape

vasolars-impact-on-malaysias-solar-energy-landscape