Some symbols

William Sandqvist william@kth.se

Some symbols ISR Task Timer Binary Semaphore Mailbox Message Queue Counting Semaphore Event Flag Mutex Semaphore William Sandqvist william@kth.se

6-5 Synchronization with semaphores Task 1 …a = f1(…);// Synchronization pointf2(b);… Task 2 …b = g1(…);// Synchronization pointg2(a);… Operations: accessSem(Sem) and releaseSem(Sem) Synchronize Code with binary semaphores! William Sandqvist william@kth.se

Binary semaphores Sem1 and Sem2 Sem1 and Sem2 are created with the value ”1” at start! Task 1…accessSem(Sem1);…a = f1(…);releaseSem(Sem1);// Synchronization point accessSem(Sem2);f2(b);releaseSem(Sem2);… Task 2…accessSem(Sem2);…b = g1(…);releaseSem(Sem2);// Synchronization point accessSem(Sem1);g2(a); releaseSem(sem1);… William Sandqvist william@kth.se

Binary semaphores Sem1 and Sem2 Sem1 and Sem2 are created with the value ”0” at start! Task 1…a = f1(…);releaseSem(Sem1);// Synchronization point accessSem(Sem2);f2(b);releaseSem(Sem2);… Task 2…b = g1(…);releaseSem(Sem2);// Synchronization point accessSem(Sem1);g2(a); releaseSem(sem1);… William Sandqvist william@kth.se

Task Triplet P( max execution time, period, deadline ) Create periodical tasks A soft-timer could Release a semaphore periodically. A task could Access a semaphore before execution. William Sandqvist william@kth.se

Finite Impulse Response filter You have programmed a FIR-filter in LAB 2. Every filter stage needs a MAC-operation. MAC = Multiply and ACkumulate. sample = input(); x[oldest] = sample; y = 0;for (k = 0; k < N; k++){ y += h[k] * x[(oldest + k) % N];}oldest = (oldest + 1) % N; output(y); William Sandqvist william@kth.se

a program could do 13,5% moore in the same execution time. 7-1 Hardware Accelerators DSP application. 15% of the execution time are call’s to a function that performs a MAC operation. Multiply and ACkumulate. An alternative is to use an other processor which has a MAC-instruction. Suppose that we have the ratio: How much could the total execution time be increased if the processor with the MAC-instruction is used? Without MAC 15% + 85%= 100% With MAC 1.5% + 85% + 13.5% = 100% William Sandqvist william@kth.se

7-3 Hardware accelerator X = A * B + C * D William Sandqvist william@kth.se

Processor only X = A * B + C * D load p1,A # 2 time units load p2,B # 2load p3,C # 2load p4,D # 2mul p5,p1,p2 # 8mul p6,p3,p4 # 8add p7,p5,p6 # 1store p7,X # 2Grand total = 27 time units Can the Hardware Accelerator improve on this? William Sandqvist william@kth.se

DFG Detects possible parallellism Processor and Accelerator T=C*DX=A*B+T load p1,A # 2 load p2,B # 2mul p3,p1,p2 # 8 load a1,C # 2 load a2,D # 2 mul a3,a1,a2 # 1 store T,a3 # 2 (=7) load p4,T # 2add p5,p4,p3 # 1store p5,X # 2Grand total = 17 time units Parallellism! William Sandqvist william@kth.se

Speedup William Sandqvist william@kth.se

All mul’s with the accelerator load a1,A # 2 load a2,B # 2 load a3,C # 2 load a4,D # 2 mul a5,a1,a2 # 1 mul a6,a3,a4 # 1 store S,a5 # 2 store T,a6 # 2load p1,S # 2 load p2,T # 2add p3,p2,p1 # 1store p3,X # 2 Grand total = 21 S=A*BT=C*DX=S+T No parallellism! William Sandqvist william@kth.se

Speedup William Sandqvist william@kth.se

Accelerators in the Cyclone II chip The Cyclone II chip has Embedded Multipliers to use as Hardware accelerators. (They could be connected to the Embedded Nios II-pro-cessor with the Avalon bus). Up to 150 18bit18bit Multiplicator units can be used! William Sandqvist william@kth.se

5-9 Cache performance This is an example of a problem from part B of the written exam. int i;int y = 0;int u[60];int v[60];. . .for(i = 0; i < 60; i++) y += u[i] * v[i];. . . Datacache size 128 Bytes, Cacheline/Block 32 Bytes (8 int).u and v are located in sequence in memory. Variables i and y are stored in processor registers. William Sandqvist william@kth.se

Hitrate estimation Draw the memory and Cache as Cache-line/Block organized. Block is then 8 int. Vector u and v each occupy 7.5 blocks in memory. We don’t know if the mapping looks exactly this way, but the conflicts will be the same. u[0] M, v[0] M, u[1…3] HHH, v[1…3] HHHu[4] H, v[4] M, conflict misses u[5…7] MMM, v[5…7] MMM… MM HHH HHH H M MMM MMM … 50% (loop stops at 59, numbers 60…63 are not included, the hitrate will actually be > 50%) William Sandqvist william@kth.se

Program changes for max hitrate int i;int y = 0;int u[72]; /* +12 dummy */ int v[60];. . .for(i = 0; i < 60; i++) y += u[i] * v[i];. . . v is moved 12 int’s by extending u with dummy elements. MHHHHHHH.Hitrate 88%. Is 100% possible? No, there must always be one cold miss every cacheline. The index i counts forwards – every int is used only once, no int is reused! William Sandqvist william@kth.se

Good Luck! William Sandqvist william@kth.se

Some symbols

Some symbols

Presentation Transcript

Symbols

Symbols

Symbols

Symbols

Symbols

Symbols

Symbols

Symbols

Symbols

Symbols

What are some Symbols?

Symbols

Symbols

SYMBOLS

Italian II Presentation Some symbols of Italian culture

Some new symbols to get used to…

Symbols

Symbols

LI: To know some Hindu Symbols

Symbols

Symbols

Symbols