280 likes | 398 Views
Hardware programming. Hardware and programming models SIMD MIMD Dataflow Transformation grammars. SIMD hardware. SIMD array processor. SIMD instructions. Drawing a rectangle using one proc/pixel. Each array processor has an id (IDx, IDy) Drawing (x1, y1) (x2, y2)
E N D
Hardware programming • Hardware and programming models • SIMD • MIMD • Dataflow • Transformation grammars Hardware programming languages August 20, 2014
SIMD hardware Hardware programming languages August 20, 2014
SIMD array processor Hardware programming languages August 20, 2014
SIMD instructions Hardware programming languages August 20, 2014
Drawing a rectangle using one proc/pixel • Each array processor has an id (IDx, IDy) • Drawing (x1, y1) (x2, y2) • Use R0,R1 as condition registers • SUB R0 IDx – x1 • SUB R1 x2 – IDx • AND R0 R0 & R1 • SHR R0 R0 >> 31 • LD if R0 then pixel 1 • LD if !R0 then pixel 0 • Similar operations for triangles, circles, etc. Hardware programming languages August 20, 2014
High-pass filter • Subtract adjacent pixels • LD R0 pixel • LD Nleft R0 • SUB R0 R0 – Nright • LD pixel R0 • Other kernel applications are similar Hardware programming languages August 20, 2014
Matrix multiplication • Assume the matrix elements are in R1, R2 and the result should be in R0 • O(N) for square matrices of dimension NxN • Wraparound neighbors • LD R0 #0 • LD Nleft R1 • LD Ntop R2 • --cpu only-- for(i = 0; i != N; i++) { • LD R3 Nbot • LD Ntop R3 • LD R4 Nright • LD Nleft R4 • ADD R0 += R3 * R4 • } Hardware programming languages August 20, 2014
SIMD source languages • Major changes • Some data is SIMD array data stored on the array processors • The CPU alternates between CPU operations and SIMD operations • The array processors do not branchso both branches of conditionals are always executed Hardware programming languages August 20, 2014
SIMD language examples • Matrix multiplication • void mmult(simd int[N][N] m1, m2, m3) { • for(i : [0..N]) • m1[i][i] = m2[-][i] * m3[i][-]; • } • Drawing • void rect(int x1, int y1, int x2, int y2) { • for(i : [0..N]) • pixel[i][i] (x1 <= i <= x2) && (y1 <= i <= y2) • } Hardware programming languages August 20, 2014
Programming language history • Late 70s – early 90s • Tremendous effort to parallelize FORTRAN • DO 10 I=1, 100 • DO 10 J=1, 100 • 10 M(I, J) += A(I, J) * B(J, I) • This was really hard, and eventually ineffective • Much code was rewritten in C • SIMD programs require explicit arrays • But most current code is written in C… • Define a new language, or parallelize C? Hardware programming languages August 20, 2014
Vector processors Hardware programming languages August 20, 2014
Vector instructions • Vector arrays v[0..N] • ADD for i in [0..N] do v1[i] v2[i] + v3[i] • MUL for i in [0..N] do v1[i] v2[i] * v3[i] • ACC for i in [0..N] do v1 += v2[i] * v3[i] • Conditional operations • ADDcc for i in [0..N] do • if v4[i] then v1[i] v2[i] + v3[i] • Scatter (some ~1985 Toshiba machines) • ADD for i in [0..N] do • v1[v4[i]] v2[i] + v3[i] • Gather (same) • ADD for i in [0..N] do • v1[i] v2[v4[i]] + v3[i] Hardware programming languages August 20, 2014
Vector programming languages • As before, the basic data type is a vector of numbers (floats or ints) • Matrix multiply in a serial language • Inner product method O(n3) Hardware programming languages August 20, 2014
MMult with middle products • Exchange the loop nesting Hardware programming languages August 20, 2014
Vectorizing the middle product Hardware programming languages August 20, 2014
MMult with outer products • Move the “k” loop to the outside Hardware programming languages August 20, 2014
Vectorizing the outer product Hardware programming languages August 20, 2014
Vector processing • The language is a language of vectors (arrays, matrices, multi-dimensional matrices, etc.) • Most current code is not written with vectors (but it could be) • Different vector organizations will give different performance on different hardware • Vector ops take (highly optimized) linear time • Vector programming is a form of SIMD programming • Can add conditionals (if c[0..N] then a[0..N] + b[0..N]) Hardware programming languages August 20, 2014
Dataflow machines • Suppose you have a system with: • A programmable hardware device • Some finite amount of memory • Multiple special-purpose processors • What do you have? • An nVIDIA GPU… • A processor-in-memory… • An FPGA… • Good: • Easy to build (current tech gives 1000s of processors) • Extreme speed • Bad: • How do you program this thing? Hardware programming languages August 20, 2014
Dataflow model Hardware programming languages August 20, 2014
Dataflow vs von Neumann Hardware programming languages August 20, 2014
System-on-a-chip • Programmable FPGAs • Small finite storage • Master CPU • I/O • Large amount of interconnect Hardware programming languages August 20, 2014
Extracting dataflow from serial programs • Build the dataflow graph (for example, CS134b register allocation) • This method doesn’t really work • Poor parallelism • Memory allocation is a problem Hardware programming languages August 20, 2014
Writing dataflow programs • Draw “circuit” diagrams • Use discrete-event-simulation models Hardware programming languages August 20, 2014
Dataflow programs Hardware programming languages August 20, 2014
PL and hardware • Hardware-specific programming languages • Can be fast, often not portable • Software usually has to be rewritten when platform changes • A lot of C programs are still rewritten for new platforms • Hardware • Extreme hardware design is limited by the software community • How does nVIDIA get away with it? Hardware programming languages August 20, 2014
Transformation grammars Hardware programming languages August 20, 2014
Transformation grammars Hardware programming languages August 20, 2014