340 likes | 497 Views
Parallel Architectures: Topologies. Heiko Schröder, 2003. memory. processor. memory. memory. cache. memory. memory. processor. memory. processor. memory. Types of sequential processors (SISD). Von Neumann bottleneck. PE + control unit. PE + control unit. PE +
E N D
Parallel Architectures: Topologies Heiko Schröder, 2003
memory processor memory memory cache memory memory processor memory processor memory Types of sequential processors (SISD) Von Neumann bottleneck
PE + control unit PE + control unit PE + control unit PE + control unit PE PE PE PE PE Global control unit Interconnection network Interconnection network SIMD MIMD SIMD SPMD
P P P P P PE + M control unit PE + M control unit PE + M control unit PE + M control unit M M Interconnection network Interconnection network M M Message passing /shared address space P/M
Various communication networks State of the art technology Important aspects of routing schemes Known results (theory) The internet
Desirable feature of a network • 1. Algorithmic • Low diameter (1, complete graph) • High bisection width (complete graph) n(n-1)/2 edges Degree n-1 • 2. Technical • Low degree (pin limitations – constant – modular – mesh) • Short wires (mesh) • Small area (mesh) • Regular structure (mesh)
Connection networks I 1-D mesh (linear array) Diameter n-1 Bisection width 1
Tree Diameter 2(log n) Bisection width 1
H-tree Area: O(n) Longest wire :O(n) Clock distribution
Diameter: Bisection width : 2-D Mesh
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 8 2 7 3 6 4 5 1 1 2 8 3 2 4 7 5 3 6 6 7 4 8 5 Torus Reduced diameter Increased bisection width All nodes equivalent Long wires?
Diameter: Bisection: 3-D Mesh
0 00 10 diameter log n bisection width n/2 0-D 1-D 2-D 1 01 11 0 1 000 010 001 011 3-D 4-D 100 110 101 111 Hypercube
# nodes Diameter> nodes bisection nodes Cube Connected Cycles
Exchange (lsb) Shuffle (rotate -- left or right) 010 011 001 110 111 000 100 101 8-node shuffle-exchange graph Degree: 3 Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges Bisection width: (n / log n)
Exchange (lsb) ex ls+ex u1u2…uk-1uk u1u2…uk-1v1 u2…uk v1v2 … ls+ex uk v1v2…vk-1 v1v2…vk 16-node shuffle-exchange graph Degree: 3 Shuffle (rotate -- left or right) 1001 1101 1000 1100 0001 0100 0000 0101 1010 1011 1110 1111 0010 0011 0110 0111 Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges Bisection width: (n / log n)
0 u1u2…uk-1uk u2u3…uk-1uk0 1 u1u2…uk-1uk u2u3…uk-1uk1 001 011 1 1 1 000 1 111 010 1 101 0 0 1 0 1 1 0 0 0 0 0 100 110 3-dimensional de Bruijn graph In-degree = out-degree = 2 Diameter: log n Bisection width: (n / log n) Each Eulerian tour = De Bruijn sequence = contains each possible sub-string of length 4 exactly once 1111001011010000 De Bruijn sequence
Butterfly network FFT routing sorting Unique path
Diameter (log n) Bisection width ( ) Mesh of trees
4-D The Power of Hypercubes • Hamiltonian cycle • Gray codes • k-D meshes (tori), N-nodes • simulates mesh of trees • simulates hypercubic networks • contains complete binary tree, almost • normal algorithms
Hamiltonian Cycle A hypercube contains a Hamiltonian cycle -- proof by induction. Each Hamiltonian cycle corresponds to a Gray code (only one bit is changed per link).
Gray code 00 01 11 10 000 001 011 010 110 111 101 100 0 1 reflection
wrap around Hypercube contains meshes/tori 00 01 03 02 10 11 13 12 30 31 33 32 20 21 23 22 Theorem: Any n1 x n2 x … x nk mesh (with or without wrap arounds) is a sub-graph of an n-D hypercube if ni = 2n . Proof: (see Leighton: Each sub-cube has Hamiltonian cycle)
double-roots (different dimension) Hypercube contains double-rooted trees HC can implement all tree algorithms and also all mesh-of-tree-algorithms (possibly with minor delay).
Normal algorithms • A hypercube algorithm is said to be normal if • only one dimension of hypercube edges is used at any step and • if consecutive dimensions are used in consecutive steps. • Most hypercube algorithms are normal. • Normal algorithms can be embedded efficiently on hypercubic networks
1 1 2 2 1 2 1 2 2 2 2 2 2 0 1 31 2 2 30 2 1 3 29 2 2 2 2 4 28 5 27 6 26 7 25 8 24 9 23 10 22 11 21 12 20 19 13 14 18 15 17 16 Josephus graph: Every even node k is connected to k+2i-3 Diameter: about (log n) / 2
1234 4231 3214 2314 2431 1324 3421 3124 4321 2134 2341 4132 3241 1243 1432 4213 3412 2413 4312 1423 1342 3142 4123 2143 Star graph: Set of nodes: k! nodes of degree k-1. Permutations of k elements. Set of edges: Exchange of first element with one other. Small degree, diameter about 2 log n . Open problems: E.g. are there (k-1)/2 edge disjoint Hamiltonian cycles? Number of nodes versus degree (Star/HC): 24, 120, 720, 4340, 34720, 312480 16, 32, 64, 128, 256, 512
4-D 12 192 • 256 16 pin - limitations 16 1
wiring - limitations 4-D 12 1 216 nodes bisection width: 256 32 K 25cm 32 m
Improve the topology? The internet
against parallelism • cost(large) < cost (2 small) • all the FORTRAN / C software • let’s stick to pipelining • let’s wait for faster machines • Amdahl’s Law