The Future of Parallel Computing

C P R A The Future of Parallel Computing SA ISA PIPS RM OH Special Purpose Mesh Architectures Heiko Schröder, 1998

Fine grain 1983 Coarse grain 1997 Contents • Why meshes ??? • Application specific parallel mesh architectures • -Systolic Arrays • -Instruction Systolic Arrays • -PIPS • -Reconfigurable mesh • -Optical Highway

Physical limits • OPS -- 0.3 mm/OP • 1000 PEs with OPS --30cm/OP • massive parallelism • distributed memory c=300 000 km/sec

1000 Pentium 2 Pentium 100 80486 10 80386 Performance (MIPS) 80286 1 8080 0.1 4004 0.01 1970 1975 1980 1985 1990 1995 Year Processor power

Scaling • Faktor 2: • 1/2 width • 1/2 hight • 1/2 switching time 0,5 µ 8 x performance! 0,25 µ

CMOS transistors 10m Size of minimal transistor 1m 0,1m ca. 0,03m 0,01m 1960 1970 1980 1990 2000 2010 2020 2030

diameter bisection width 2D mesh Mesh/Torus

0 00 10 diameter log n bisection width n 0-D 1-D 2-D 1 01 11 0 1 000 010 001 011 3-D 4-D 100 110 101 111 Hypercube

VLSI Very Large Scale Integration • simple cells • few types • regular architecture • short connections mesh -- torus

diameter 256 diameter 16 16 pins 16x12 pins 16x16 pins Pin limitations

Bisection width 256 Bisection width 32K 25 cm 32 m Bisection width

Programming • SA --- Systolic Array • SIMD ---Single Instruction Multiple Data • ISA ---Instruction Systolic Array • MIMD ---Multiple Instruction Multiple Data

parallel merge initial situation: 1.) sort columns (odd-even-transposition sort) 2.) sort rows (odd-even-transposition sort) sorted !!!! x1 x2 x3 x4 x5 x6 ... x7 ... x17 x18 y1 y2 y3 y4 y5 y6 ... y7 ... y17 y18

0s 1s initially 0s 0s after vertical sort 1s 0s after horizontal sort 1s 0-1 principle • The 0-1 principle states that if all sequences of 0 and 1 are sorted properly than this is a correct sorter. • The sorter must be based on moving data.

MIMD-mesh (clocked) min max Time: 2n

1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 systolic merge

1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 1 3 3 4 5 5 6 7 4 4 3 2 9 8 8 7 systolic merge

1 3 3 4 5 5 6 7 4 4 3 2 9 8 8 7 systolic merge

1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7 systolic merge

1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7 systolic merge

1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7 systolic merge

1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7 systolic merge

1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8 systolic merge

1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8 systolic merge

1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9 systolic merge • sorted !!!

Characteristics of SAs Extremely high cost-performance no flexibility -- long development time Suitable for special signal processing tasks ???

Systolic architectures I

Systolic architectures II

C:=min{C, CE} C:=max{C, CW} 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 ISA merge

1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 ISA merge

1 3 3 4 5 5 6 7 4 8 8 7 9 4 3 2 ISA merge

1 3 3 4 4 5 6 7 5 4 8 7 9 8 3 2 ISA merge

1 3 3 4 4 4 6 7 5 5 3 7 9 8 8 2 ISA merge

The Future of Parallel Computing

The Future of Parallel Computing

Presentation Transcript

Parallel Computing

Parallel Computing

Parallel Computing Explained Parallel Computing Overview

Parallel Computing

Parallel Computing

Parallel computing

Parallel Computing

Parallel Computing

Parallel Computing

Parallel Computing

Future of parallel computing: issues and directions

The Future of Computing

Parallel Computing

Parallel Computing

Parallel Computing

Parallel computing

Building the Future of Computing

The Future of Computing

The Future of Cloud Computing

The Future of Computing

Parallel Computing

The Future of Cloud Computing