390 likes | 492 Views
Exercises Embedded Systems. 1.1 The C-function. Flowchart. int fac_c( int x) { int f; if (x <= 0) f = 0; else { f = 1; while (x > 1) { f = f * x; x--; } } return f; }. fac_c(5) calculates 1*5*4*3*2*1=120. int fac_c(int x). else. N.
E N D
Exercises Embedded Systems William Sandqvist william@kth.se
1.1 The C-function Flowchart int fac_c(int x){int f;if(x <= 0) f = 0;else { f = 1;while(x > 1) { f = f * x; x--; } }return f;} fac_c(5) calculates 1*5*4*3*2*1=120 int fac_c(int x) else N x < = 0 Y if f = 0 f = 1 while N x > 1 Y f = f * x We should document our code. You can find a flowchart tool in Word or Powerpoint. This could be useful for lab reports. x = x-1 return f End William Sandqvist william@kth.se
main in C Message to the linker:fac_asm() is an external function (from an other file). #include<stdio.h>externint fac_asm(int);int fac_c(int);int main(void){int c_result, asm_result;int x;while(1) { printf(”Enter a number: ”); scanf(”%d”, &x); c_result = fac_c(x); asm_result = fac_asm(x); printf(”C-result: %d\n”, c_result); printf(”Asm-result: %d\n”,asm_result); }return 0;} William Sandqvist william@kth.se
Structure diagram? To document the program structure, a structure diagram could be useful. It could be directly translated into structured programming. ( while, if, else … ) But in assembler, we are not interested in the program structure, but in the program flow. William Sandqvist william@kth.se
The Flowchart The flowchart could be directly translated to assembler code. William Sandqvist william@kth.se
How to program the Nios processor? The Nios processor is the Altera version of a MIPS processor. It is designed to make efficient use of the resources in a FPGA. It comes in three versions: Small – Medium – Large … William Sandqvist william@kth.se
Nios II registers 0…15 Use as constant ”0”! If you call a subroutine, save the contents of the registers you’ve used on stack! William Sandqvist william@kth.se
Nios II registers 16…31 Points to the stack! William Sandqvist william@kth.se
Register operations, R-type instructions William Sandqvist william@kth.se
Program constants, I-type instructions Some pseudoinstructions:movi rB, IMMEDaddi rB,r0,IMMEDmovia rB,label orhi rB,r0,%hiadj(label) addi rB,r0,%lo(label) William Sandqvist william@kth.se
I-type, Branch Pseudoinstruction: blebranch if less than or equal signed bge is the ble with register A and B swapped! The IMM16 adress is effectively a 18 Byte-adress because instructions must be word-aligned. William Sandqvist william@kth.se
Conditional operators of C Compare two registers and branch relative if the expression is true. All C-language conditional operators have assembly instructions (or pseudoinstructions). William Sandqvist william@kth.se
Memory content, Load and Store Store in memory … stw r6, 100(rA) William Sandqvist william@kth.se
The call and ret instructions William Sandqvist william@kth.se
From Flowchart to assembler William Sandqvist william@kth.se
Assembler fac_asm has to be made known to other files .global fac_asm.text# Parameter in r4 (and if needed in r5, r6, r7)# Return value in r2 (and r3 if long or double)# we can use r2 and r3 for calculations until return# r8 … r15 must be saved by caller of a subfac_asm:# int r2 fac_asm(int r4 x), the function prototype# r3 : for constant ”1”if: ble r4, r0, else # if(x <= 0) movi r3, 1 # constant ”1” mov r2,r3 # f = 1while: ble r4,r3, endsub # while(x>1){ mul r2,r2,r4 # f = f*x sub r4,r4,r3 # x = x - 1 br while # }else: mov r2, r0 # f = 0endsub: ret # return r2.end William Sandqvist william@kth.se
Exercises Embedded Systems William Sandqvist william@kth.se
2.1 Prioritized interrupts William Sandqvist william@kth.se
Exercises Embedded Systems William Sandqvist william@kth.se
2.2 Input/Output R/W reverses the direction of the databuss. CS Chip Select enables the chip Connect a 8 register memory-mapped peripheral to the CPU. The CPU has 8 bit address and data busses. The peripheral should have registeraddresses 0x10…0x17. William Sandqvist william@kth.se
Decode - doorlock How to open the doorlock? Press 4 (d) and 8 (h) simultaneously but don’t press any other key! William Sandqvist william@kth.se
Connections Decoder 0x10 = 00010.000 0x11 = 00010.001 0x12 = 00010.010 0x13 = 00010.011 0x14 = 00010.100 0x15 = 00010.101 0x16 = 00010.1100x17 = 00010.111 CS RS2RS1RS0 William Sandqvist william@kth.se
Why memory cache? William Sandqvist william@kth.se
Exercises Embedded Systems William Sandqvist william@kth.se
3.2 Hitrate and accesstime a) tAVG = 8 ns h = ? h is hitrate. b) tAVG = 15 ns h = ? c) tAVG = 6 ns h = ? William Sandqvist william@kth.se
Hitrate calculations tAVG 8, 15, 6 ns William Sandqvist william@kth.se
Exercises Embedded Systems William Sandqvist william@kth.se
Exercises Embedded Systems William Sandqvist william@kth.se
3.1 Memory system In this example. The Blocktransfer is Cache-line of 2 words. The memory is Byte-organized, but we could draw it as if it was organized in Memory-lines with the same size as the Cache-line. This will simplify all figures. Direct addressmapping: Memory-line: i Cache-line: j = i % K William Sandqvist william@kth.se
Why Blocktransfer? ”1 word” 3TBus/word ”2 words” (3+1)/2 = 2TBus/word ”4 words” (3+1+1+1)/4 = 1.5TBus/word • To transfer 1 ”random” word in memory takes three buscykles 3TBus/word ( 2 TBUS are Waitstates) • To transfer a ”Burst” of 2 words takes 3+1 buscykles, 4/2 = 2TBus/word • To transfer a ”Burst” of 4 words takes 3+1+1+1 buscykler, 6/4 = 1,5TBus/word • To transfer a ”Burst” of 8 words takes 3+1+1+1+1+1+1+1 buscykles, 10/8 = 1,25TBus/word Remember, to make these gains, you must have use for most of the transfered words – otherwise blocktransfer could be even slower than random transfer! This is just an example. Other accesspatterns exists, eg. 5+3+3+3 and so on. The busclock is derived from the processorclock, perhaps TBUS = 10*TCPU. William Sandqvist william@kth.se
Mapping of memory address Memory 4kB 4*210 = 212 Bytes. Memory address: mmmmmmmmmmmm Cache 8 Word, 8*32 Bytes. Cache-line 2 Word, 2*4 Byte. Cache-address: ll.w.bb The Adress Tag Memory – Cache mapping:mmmmmmm.mm.m.mmttttttt.ll.w.bb Adress in Cache is irrespective of tag-bits! Our example: Data-adresses are acessed four times in this order:0x010, 0x1FC, 0x168, 0x008, 0x014, 0x1F8, 0x00C William Sandqvist william@kth.se
Memory and Cache Data is acessed from three different locations (Tags), but they will map to the same lines in this small cache! William Sandqvist william@kth.se
Direct mapped Cache William Sandqvist william@kth.se
Program execution Data-adresses are acessed four times in this order:0x010, 0x1FC, 0x168, 0x008, 0x014, 0x1F8, 0x00C Cache access, line#(tag#): 2(0)3(1)1(2)1(0)2(0)3(1)1(1)2(0)3(1)1(2)1(0)2(0)3(1)1(1) 2(0)3(1)1(2)1(0)2(0)3(1)1(1) 2(0)3(1)1(2)1(0)2(0)3(1)1(1) CCCMHHMHHMMHHMHHMMHHMHHMMHHM C, ColdMiss = line entry to apreviously unused cache memory (This counts as a Miss) M, Miss = the previous line entry was from an other location (tag) H, Hit = the previous line entry was from the same location (tag) William Sandqvist william@kth.se
2-way set associative cache Memory address: mmmmmmmm.m.m.mm Address mapping: tttttttt.l.w.bb OBSERVE! The set number is not included in the address map. Logic circuits within the associtive cache takes care of the set number and connects the CPU with the correct set. ( Tags are stored in associative cache for each line in every set. All sets are searched in parallell for tag. ) William Sandqvist william@kth.se
Example of how an associative cache can boost performance Memory: 0x010, Tag: 0x01 Cache: 0x0=0b0.0.00Memory: 0x1FC, Tag: 0x1F Cache: 0xC=0b1.1.00Memory: 0x168, Tag: 0x16 Cache: 0x8=0b1.0.00Memory: 0x008, Tag: 0x00 Cache: 0x8=0b1.0.00Memory: 0x014, Tag: 0x01 Cache: 0x4=0b0.1.00 Memory: 0x1F8, Tag: 0x1F Cache: 0x8=0b1.0.00Memory: 0x00C, Tag: 0x00 Cache: 0xC=0b1.1.00 ( Nice example. The Cache part is one full hex digit.) William Sandqvist william@kth.se
Fewer conflict misses Memory locations 0x010, 0x014 are stored in cache-line 0 – But there are two sets! Both can be stored simultaneously. 0x1FC, 0x168, 0x008, 0x1F8, 0x00C are stored in cache-line 1, Two of them could be stored simultaneously. You have to consider the exchange policy in order to be able to analyse this example in full detail. (Not given). Exchange policy: FIFO, RANDOM, LRU … If the exchange policy were known, we could follow the cache accesses for every step to calculate hitrate: line,set(tag) line,set(tag) … William Sandqvist william@kth.se