1k likes | 1.18k Views
C. P. R. A. The Future of Parallel Computing. SA ISA PIPS RM OH. Special Purpose Mesh Architectures. Heiko Schröder, 1998. Fine grain 1983. Coarse grain 1997. Contents. Why meshes ??? Application specific parallel mesh architectures. - Systolic Arrays
E N D
C P R A The Future of Parallel Computing SA ISA PIPS RM OH Special Purpose Mesh Architectures Heiko Schröder, 1998
Fine grain 1983 Coarse grain 1997 Contents • Why meshes ??? • Application specific parallel mesh architectures • -Systolic Arrays • -Instruction Systolic Arrays • -PIPS • -Reconfigurable mesh • -Optical Highway
Physical limits • OPS -- 0.3 mm/OP • 1000 PEs with OPS --30cm/OP • massive parallelism • distributed memory c=300 000 km/sec
1000 Pentium 2 Pentium 100 80486 10 80386 Performance (MIPS) 80286 1 8080 0.1 4004 0.01 1970 1975 1980 1985 1990 1995 Year Processor power
Scaling • Faktor 2: • 1/2 width • 1/2 hight • 1/2 switching time 0,5 µ 8 x performance! 0,25 µ
CMOS transistors 10m Size of minimal transistor 1m 0,1m ca. 0,03m 0,01m 1960 1970 1980 1990 2000 2010 2020 2030
diameter bisection width 2D mesh Mesh/Torus
0 00 10 diameter log n bisection width n 0-D 1-D 2-D 1 01 11 0 1 000 010 001 011 3-D 4-D 100 110 101 111 Hypercube
VLSI Very Large Scale Integration • simple cells • few types • regular architecture • short connections mesh -- torus
diameter 256 diameter 16 16 pins 16x12 pins 16x16 pins Pin limitations
Bisection width 256 Bisection width 32K 25 cm 32 m Bisection width
Programming • SA --- Systolic Array • SIMD ---Single Instruction Multiple Data • ISA ---Instruction Systolic Array • MIMD ---Multiple Instruction Multiple Data
parallel merge initial situation: 1.) sort columns (odd-even-transposition sort) 2.) sort rows (odd-even-transposition sort) sorted !!!! x1 x2 x3 x4 x5 x6 ... x7 ... x17 x18 y1 y2 y3 y4 y5 y6 ... y7 ... y17 y18
0s 1s initially 0s 0s after vertical sort 1s 0s after horizontal sort 1s 0-1 principle • The 0-1 principle states that if all sequences of 0 and 1 are sorted properly than this is a correct sorter. • The sorter must be based on moving data.
MIMD-mesh (clocked) min max Time: 2n
1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 systolic merge
1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 systolic merge
1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 systolic merge
1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 systolic merge
1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 1 3 3 4 5 5 6 7 4 4 3 2 9 8 8 7 systolic merge
1 3 3 4 5 5 6 7 4 4 3 2 9 8 8 7 systolic merge
1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7 systolic merge
1 3 3 4 4 4 3 2 5 5 6 7 9 8 8 7 systolic merge
1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7 systolic merge
1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7 systolic merge
1 3 3 2 4 4 3 4 5 5 6 7 9 8 8 7 systolic merge
1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7 systolic merge
1 3 2 3 4 3 4 4 5 5 6 7 9 8 8 7 systolic merge
1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7 systolic merge
1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7 systolic merge
1 2 3 3 3 4 4 4 5 5 6 7 9 8 8 7 systolic merge
1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8 systolic merge
1 2 3 3 3 4 4 4 5 5 6 7 8 9 7 8 systolic merge
1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8 systolic merge
1 2 3 3 3 4 4 4 5 5 6 7 8 7 9 8 systolic merge
1 2 3 3 3 4 4 4 5 5 6 7 7 8 8 9 systolic merge • sorted !!!
Characteristics of SAs Extremely high cost-performance no flexibility -- long development time Suitable for special signal processing tasks ???
C:=min{C, CE} C:=max{C, CW} 1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 ISA merge
1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 ISA merge
1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 ISA merge
1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 ISA merge
1 3 3 4 5 5 6 7 9 8 8 7 4 4 3 2 ISA merge
1 3 3 4 5 5 6 7 4 8 8 7 9 4 3 2 ISA merge
1 3 3 4 5 5 6 7 4 8 8 7 9 4 3 2 ISA merge
1 3 3 4 4 5 6 7 5 4 8 7 9 8 3 2 ISA merge
1 3 3 4 4 5 6 7 5 4 8 7 9 8 3 2 ISA merge
1 3 3 4 4 4 6 7 5 5 3 7 9 8 8 2 ISA merge
1 3 3 4 4 4 6 7 5 5 3 7 9 8 8 2 ISA merge