310 likes | 316 Views
This article discusses examples of Petaflop supercomputers from 2008 and the ranking system used to measure their performance. It also explores the possibility of matching the computational power of the human brain.
E N D
Advanced Computer Architecture5MD00 / 5Z033TOP 500supercomputers Henk Corporaal www.ics.ele.tue.nl/~heco/courses/aca h.corporaal@tue.nl TUEindhoven 2011
Topics • How to cross the Petaflop boundary • Ranking • Nov 2008 • Nov 2009 / Nov 2010: what has been changed • Examples • Roadrunner (IBM) • Jaguar Cray • SGI Altix • BlueGene ACA H.Corporaal
How to build a Petaflop supercomputer? Some examples from 2008: • Opteron cluster (e.g. ~2X Ranger/TACC) • 32,000 quad-core Opterons (130K cores) • Cray XT3/4 (e.g. Baker/ORNL sooner) • 32,000 quad-core Opterons (130K cores) • IBM BlueGene/P (bigger sooner) • 80,000 BG/P PPC processors (320K cores) • IBM Cell-accelerated Roadrunner cluster • 10,000 Cells (80K Cell SPUs) ACA H.Corporaal
Supercomputer Ranking • Started in 1993 • Jack Dongarra, University of Tennessee • Based on LINPACK benchmark • linear algebra (LU factorization) • Superseded by LAPACK • based on BLAS (Basic Lin. Alg. Subprograms) • exploits caches • Measures Floating Point performance • Fortran code • see http://www.top500.org ACA H.Corporaal
Single-Chip GPU v.s. Fastest Super Computers ref: http://www.llnl.gov/str/JanFeb05/Seager.html
Performance Ranking Nov. 2008 ACA H.Corporaal
Performance Ranking 2008: we crossed the Petaflop boundary ACA H.Corporaal
Update November 2009 ACA H.Corporaal
Update November 2010 ACA H.Corporaal
Alternative ranking: Green500 • Most Power efficient Supercomputers • 2008: best result = 536 MFlops/Watt => 1.87 nJ / FloatingPt_operation • 2009: best result = 723 MFlops/Watt => 1.38 nJ / FloatingPt_operation • Cell cluster, ranking 110 in top500 • 2010: best result = 1684 MFlops/Watt => 594 pJ / FloatingPt operation • IBM BlueGene/Q • See www.green500.org ACA H.Corporaal
Nr1 (2008): Roadrunner • IBM cluster • 6480 nodes with • Dual core Opteron 1.8 GHz • 2 * PowerXCell 8i 3.2 GHz (12.8 GFlops) • Infiniband connection fabric (16 Gbit/s per link) • FAT tree interconnect • 100 Tbyte DRAM memory • 216 I/O nodes • MPI programming • 2.35 MW power !! • Size: 296 racks, 5500 ft2 This is huge !! ACA H.Corporaal
Cell/B.E. – the architecture • 1 x PPE 64-bit PowerPC • L1: 32 KB I$ + 32 KB D$ • L2: 512 KB • 8 x SPE cores: • Local store: 256 KB • 128 x 128 bit vector registers • Hybrid memory model: • PPE: Rd/Wr • SPEs: Asynchronous DMA • EIB: 205 GB/s sustained aggregate bandwidth • Processor-to-memory bandwidth: 25.6 GB/s • Processor-to-processor: 20 GB/s in each direction ACA H.Corporaal
Roadrunner: TriBlade = 2 nodes For more details: Presentation slides of Ken Koch, March 2008 ACA H.Corporaal
Nr2 (2008): Jaguar Cray XT5 QC • I guess 5 times • 7832 quad-core 2.1 GHz AMD Opetron • 62 TB memory (= 2GB / core) • 600 TB file system • 250 TFlop • In total 150152 cores • SeaStar2+ interconnect (from Cray) • Note 2009: quad-cores replaced by six-cores • now nr 1 • 224,256 cores • peak 1.75 PetaFlop • paper: Bland A.S., Kendall R.A., Kothe D.B., Rogers J.H., Shipman G.M. Jaguar: The World’s Most Powerful Computer ACA H.Corporaal
Jaguar ACA H.Corporaal
Nr3 (2008): SGI Altix ICE8200 • 92 racks of Al5x ICE • 8200EX with 3.0 Ghz Intel Xenon quad-core processors or • 47,104 cores • 8 racks of Al5x ICE 8200 • with 2.66 Ghz Intel quad-core • 4096 cores. • 51 TB Main memory • DDR InfiniBand ACA H.Corporaal
Nr:4 (2008) BlueGene/L IBM • Based on ASIC with PowerPC 440, 700 Mhz, each 2.8 GFlops • 105,496 nodes • 3D Torus interconnect for p2p communication + Collective network 3D-torus Complete system rack ACA H.Corporaal
BlueGene/L ASIC node ACA H.Corporaal
BlueGene/L Node board • 16 cards with 2 ASICs each • 8 GB • 180 Gflop ACA H.Corporaal
2009: BlueGene/P System: 256 racks upto 1PB 3.56 PFlops Rack: 32 Node Cards 13.9 TF/s 2-4 TB Node card: 32 processor cards 64-128 GB 435 GFlops Processor card: one 4-processor chip 13.6 GFlops 2-4 GB ASIC: 13.6 Gflops 8 MB EDRAM ACA H.Corporaal
BlueGene/P ASIC ACA H.Corporaal
PPC450: Exploiting SIMD • Two FPUs • 2 x 32 64-bit registers • SIMD • Datapath width = 16 bytes • Feeds two FPUs with 8 bytes each every cycle • Two FP multiply-add operations per cycle • 3.4 GFLOP/s peak performance ACA H.Corporaal
BlueGene/PASIC • 208M trans • 850 MHz • 16W • 90nm ACA H.Corporaal
BlueGene/P node card ACA H.Corporaal
Next: BlueGene/Q • 10 PFlops in 2011-2012 • see www.research.ibm.com/bluegene ACA H.Corporaal
Can we match the human brain ??? • Performance = 100 Billion (10^11) Neurons * 1000 (10^3) Connections/Neuron * 200 (2 * 10^2) Calculations Per Second Per Connection = 2 * 10^16 Calculations Per Second • Memory = 100 Billion (10^11) Neurons * 1000 (10^3) Connections/Neuron * 10 bytes (information about connection strength and adress of output neuron, type of synapse) = 10^15 bytes = 1 PB = 1000 TB How far off are we? ACA H.Corporaal
Software replica of one column of the neocortex cortex: 85% of brains total mass required for language, learning, memory and complex thought the essential first step to simulating the whole brain Next: include circuitry from other brain regions and eventually the whole brain. Blue brain research ACA H.Corporaal
Latest news: factorization of RSA768 • RSA used to encypher text using both public and private key • EPFL, CWI and others have broken RSA768 • This means: Factorize 768 bit number into 2 primes • Using 1700 AMD 2.2 GHz cores for 1 year =>15 Mh (single core) compute time • Current RSA standard uses 1024 bits • still save for some years ACA H.Corporaal
RSA (Rivest, Shamir, Adleman) • choose 2 (large) primes p and q • n = p*q • choose e such that e and (p-1)(q-1) are coprime (i.e. do not share prime factors) • choose d such d*e = 1 mod ((p-1)(q-1)) • public key = (n,e) private key = (n,d) • Encryption of message m: c=me mod n • Decryption of cypher c: m = cd mod n • see wikipedia for details and working example ACA H.Corporaal
RSA factorization result • factorization of RSA768, the following 768-bit, 232-digit number from RSA's challenge list: • 12301866845301177551304949583849627207728535695953347921973224215172640050726365751874520219978646938995647494277406384592519255732630345373154826850791702612214291346167042921431160222124047927473779408066535141959745985 6902143413=33478071698956898786044169848212690817704794983713768568912431388982883793878002287614711652531743087737814467999489*36746043666799590428244633799627952632279158164343087642676032283815739666511279233373417143396810270092798736308917 ACA H.Corporaal