Trends and Perspectives for HPC infrastructures

Trends and Perspectivesfor HPC infrastructures Carlo Cavazzoni, CINECA

outline • HPC resource in EUROPA (PRACE) • Today HPC architectures • Technology trends • Cineca roadmaps (toward 50PFlops) • EuroExa project

The PRACE RI provides access to distributed persistent pan-European world class HPC computing and data management resources and services. Expertise in efficient use of the resources is available through participating centers throughout Europe. Available resources are announced for each Call for Proposals.. European Tier 0 Peer reviewed open access PRACE Projects (Tier-0) PRACE Preparatory (Tier-0) DECI Projects (Tier-1) Tier 1 Local National Tier 2

TIER-0 System, PRACE regular calls CURIE (GENCI, Fr), BULL Cluster, Intel Xeon, Nvidia cards, Infiniband network FERMI (CINECA, It) & JUQUEEN (Juelich, D), IBM BGQ, Powerprocessors, custom 5D torus net. MARENOSTRUM (BSC, S), IBM DataPlex, Intel Xeonnode, Infiniband net. HERMIT (HLRS, D), Cray XE6, AMD procs, custom 3D torus net. 1PFLops SuperMUC (LRZ, D), IBM DataPlex, Intel Xeon Node, Infiniband net..

TIER-1 Systems, DECI calls

HPC Architectures Hybrid: • Server class processors: • Server class nodes • Special purpose nodes • Accelerator devices: • Nvidia • Intel • AMD • FPGA two model Homogeneus: • Server class node: • Standar processors • Special porpouse nodes • Special purpose processors

Networks Standard/switched: Infiniband Special purpose/Topology: BGQ CRAY TOFU (Fujitsu) TH Express-2 (Thiane-2)

Programming Models fundamental paradigm: Message passing Multi-threads Consolidated standard: MPI & OpenMP New task based programming model Special purpose for accelerators: CUDA Intel offload directives OpenACC, OpenCL, Ecc… NO consolidated standard Scripting: python

Roadmap to Exascale(architectural trends)

Dennardscalinglaw(downscaling) The core frequency and performance do not grow following the Moore’s law any longer L’ = L / 2 V’ = ~V F’ = ~F * 2 D’ = 1 / L2 = 4 * D P’ = 4 * P Increase the number of cores to maintain the architectures evolution on the Moore’s law The power crisis! new VLSI gen. old VLSI gen. L’ = L / 2 V’ = V / 2 F’ = F * 2 D’ = 1 / L2 = 4D P’ = P do not hold anymore! Programming crisis!

Moore’s Law Economic and market law Stacy Smith, Intel’s chief financial officer, later gave some more detail on the economic benefits of staying on the Moore’s Law race. The cost per chip “is going down more than the capital intensity is going up,” Smith said, suggesting Intel’s profit margins should not suffer because of heavy capital spending. “This is the economic beauty of Moore’s Law.” And Intel has a good handle on the next production shift, shrinking circuitry to 10 nanometers. Holt said the company has test chips running on that technology. “We are projecting similar kinds of improvements in cost out to 10 nanometers,” he said. So, despite the challenges, Holt could not be induced to say there’s any looming end to Moore’s Law, the invention race that has been a key driver of electronics innovation since first defined by Intel’s co-founder in the mid-1960s. From WSJ Itisallabout the numberofchips per Si wafer!

But! 14nm VLSI 0.54 nm Si lattice 300 atoms! There will be still 4~6 cycles (or technology generations) left until we reach 11 ~ 5.5 nm technologies, at which we will reach downscaling limit, in some year between 2020-30 (H. Iwai, IWJT2008).

What about Applications? In a massively parallel context, an upper limit for the scalability of parallel applications is determined by the fraction of the overall execution time spent in non-scalable operations (Amdahl's law). maximum speedup tends to 1 / ( 1 − P ) P= parallel fraction 1000000 core P = 0.999999 serial fraction= 0.000001

Architectural trends Peak Performance Moore law FPU Performance Dennardlaw NumberofFPUs Moore + Dennard App. Parallelism Amdahl's law

HPC Architectures Hybrid, but… twomodel Homogeneus, but… What 100PFlops system wewillsee … myguess IBM (hybrid) Power8+Nvidia GPU Cray (homo/hybrid) with Intel only! Intel (hybrid) Xeon + MIC Arm (homo) onlyarm chip, but… Nvidia/Arm (hybrid) arm+Nvidia Fujitsu (homo) sparc high density low power China (homo/hybrid) with Intel only Roomfor AMD console chips

Chip Architecture Mobile, Tv set, Screens Video/Image processing Strongly market driven Intel ARM NVIDIA Power AMD New archto compete with ARM LessXeon, but PHI Main focus on low power mobile chip Qualcomm, Texas inst. , Nvidia, ST, ecc new HPC market, server maket GPU alone willnot last long ARM+GPU, Power+GPU Embedded market Power+GPU, only chance for HPC Console market Still some chance for HPC

CINECA Roadmaps

Roadmap 50PFlops

Tier 1 CINECA Procurement Q2014 Requisiti di alto livello del sistema Potenza elettrica assorbita: 400KW Dimensione fisica del sistema: 5 racks Potenza di picco del sistema (CPU+GPU): nell'ordine di 1PFlops Potenza di picco del sistema (solo CPU): nell'ordine di 300TFlops

Tier 1 CINECA Requisiti di alto livello del sistema Architettura CPU: Intel XeonIvyBridge Numero di core per CPU: 8 @ >3GHz, oppure 12 @ 2.4GHz La scelta della frequenza ed il numero di core dipende dal TDP del socket, dalla densità del sistema e dalla capacità di raffreddamento Numero di server: 500 - 600, ( Peakperf = 600 * 2socket * 12core * 3GHz * 8Flop/clk = 345TFlops ) Il numero di server del sistema potrà dipendere dal costo o dalla geometria della configurazione in termini di numero di nodi solo CPU e numero di nodi CPU+GPU Architettura GPU: Nvidia K40 Numero di GPU: >500 ( Peakperf = 700 * 1.43TFlops = 1PFlops ) Il numero di schede GPU del sistema potrà dipendere dal costo o dalla geometria della configurazione in termini di numero di nodi solo CPU e numero di nodi CPU+GPU

Tier 1 CINECA Requisiti di alto livello del sistema Vendor identificati: IBM, Eurotech DRAM Memory: 1GByte/core Verrà richiesta la possibilità di avere un sottoinsieme di nodi con una quantità di memoria più elevata Memoria non volatile locale: >500GByte SSD/HD a seconda del costo e dalla configurazione del sistema Cooling: sistema di raffreddamento a liquido con opzione di free cooling Spazio disco scratch: >300TByte (providedby CINECA)

Thank you

Trends and Perspectives for HPC infrastructures

Trends and Perspectives for HPC infrastructures

Presentation Transcript

Trends in VET: Challenges and perspectives

Evaluation in European Foundations: Trends and Perspectives

Future trends on geospatial distributed infrastructures

USING FLUENT FOR HPC

Opportunities and Trends In The HPC Technical Computing Market

Reliable Infrastructures for eHealth

Data infrastructures for Science

Infrastructures for cultural heritage

202X trends and perspectives - Immunization Strategy -

Education and Training for Research Infrastructures

Industry Trends and Perspectives: What’s Hot For 2008

R and HPC

Technology Trends and Perspectives, 2010

Information Services and local HPC support Perspectives on NGS Futures

Metrics for HPC

Future trends and perspectives in immunization

State-of-the-Art Analysis and Perspectives of China HPC Development: A View from 2010 HPC TOP100

Industry Trends and Perspectives: What’s Hot For 2008

HPC Training Perspectives and Collaborations

Abstract Image Management and Universal Image Registration for Cloud and HPC Infrastructures

Infrastructures and Evaluation

202X trends and perspectives - Immunization Strategy -