190 likes | 295 Views
ECE 2560. L12-Algorithm timing. Department of Electrical and Computer Engineering The Ohio State University. Execution time. Execution time for instructions Algorithm execution time Dumb code Smarter code Constant time
E N D
ECE 2560 L12-Algorithm timing Department of Electrical and Computer Engineering The Ohio State University ECE 3561 - Lecture 1
Execution time • Execution time for instructions • Algorithm execution time • Dumb code • Smarter code • Constant time • Information relevant to this lecture can be found in the 430 Users Guide at the end of chapter 3. ECE 3561 - Lecture 1
The first code • Computing A*B • Choose A as number of times to add B to SUM • WHILE A > 0 LOOP • SUM = SUM + B; • A = A-1; • END LOOP; • How long does this take to execute? • A and B are integers from 0 to 127 • On average the loop will be repeated 64 times, minimum 0 times, maximum values 127 times. • Need to look at instruction execution cycles. ECE 3561 - Lecture 1
The code • Code of subroutine • srmultmov 2(SP),R6 ;B to R6 • mov 4(SP),R5 ;A to R5 • clr R7 • mlp add R6,R7 • dec R5 • jnemlp ;at end R7 is A*B • mov R7,4(SP) ;R7 for rtn • ret ;return from sr • Need to look at each instruction for cycles it takes to execute. ECE 3561 - Lecture 1
Time for instructions • Tables 3-14, 3-15, and 3-16 of the users manual provide information on the time it takes for instructions to execute. • Table 3-14 is for interrupts and will be discussed later. • Table 3-15 is for single operand instructions, RRA, RRC, SWPB, SXT, PUSH, and CALL • The PUSH is the only instruction of note and not in the code on the previous slide. ECE 3561 - Lecture 1
Table 3-16 • Note it is by src and dst addressing modes. • Example: • mov R5,R6 – • 1 cycle – 1 byte • mov 2(SP),Rx • 3 cycles – 2 bytes • mov R7,4(SP) • 4 cycles – 2 bytes ECE 3561 - Lecture 1
Side note • This processor has instructions that occupy 1 to 3 words in memory and • Take 1 to 6 cycles to execute. • Such a processor is considered to be a CISC processor even though it may have many RISC features. A true RISC processor has instructions that are 1 or 2 words (machine instruction and operand address) and take 1 or 2 cycles to execute. ECE 3561 - Lecture 1
Analyze our program • Instructions and cycles • srmultmov 2(SP),R6 3 • mov 4(SP),R5 3 • mlp add R6,R7 1 • dec R5 1 • jnemlp 2 • mov R7,4(SP) 4 • ret 3 • Note: Timing for a few instructions such as ret is not provided but the timing for Format-II instructions is. Most likely ret is 3 cycles ECE 3561 - Lecture 1
Time for routine • Setup time – 6 cycles • Loop cycles – 4 cycles • Cleanup/return cycles – 7 cycles • Total cycles = setup + cleanup + n*loop • = 13 + 4n • What is n? – n is one of A,B in A*B • What is time? • Max that can be multiplied is 127x127 so • Max cycles = 13 + 4*127 = 621 • Min of values is 1 x 1 or just 17 cycles • Average would be 64 times in the loop so • Average cycles = 13 + 4*64 = 269 ECE 3561 - Lecture 1
Modification to routine • To shorten the time the code could be modified to make the A, the value for the loop count, to be the smaller. • However, this would not change the equation just derived. • A better algorithm is needed. ECE 3561 - Lecture 1
Multiplication • Consider multiplication in base 10 • 1006 • x 32 • 2012 • 3018 . • 32192 • Binary is much the same if not easier ECE 3561 - Lecture 1
Binary multiplication • Binary is either 0 or 1 • 1100111 multiplicand • x 0100101 multiplier • 1100111 • 1100111 • 1100101 . • 111010100011 • An algorithm can be developed to do essentially this. • The multiplicand is shifted as each bit of the multiplier is examined. If a 1, then the multiplicand is added to the final sum. A finite time algorithm. ECE 3561 - Lecture 1
The code • Put arguments in R5 and R6 • Sum will be in R7 • srmultmov2(SP),R5 ;A multiplier • mov 4(SP),R6 ;B multiplicand • clr R7 ;R7 for sum • mov #0x0001,R9 ;the mask for testing • mov #8,R8 ;will execute loop 7 times • toldec R8 • jeq done • bit R9,R5 ;test bit of multiplier • jeqnxtbit ;jump if zero ECE 3561 - Lecture 1
What is bit test • The BIT instruction logically AND’s the source and destination. If only 1 bit of one operand is set it tells you if that bit of the other operand is a 0 or 1. • Set up a mask in a register. This will be used to check the bit of the multiplier. • After BIT test the CCR bits for N,Z,C=Z’, and V reset. ECE 3561 - Lecture 1
The overall scheme • Setup • Register 5 has the multiplier • Register 6 has the multiplicand • Register 9 has the mask • Clear sum • Loop 7 times • BIT R9 and R5 • If not zero THEN add_miltiplicand_to_sum • Shift R9 left 1 position • Shift R6 left 1 position (*2) • Back to Loop ECE 3561 - Lecture 1
The complete code • Put arguments in R5 and R6 • Sum will be in R7 • srmultmov2(SP),R5 ;A multiplier • mov 4(SP),R6 ;B multiplicand • clr R7 ;R7 for sum • mov #0x0001,R9 ;the mask for testing • mov #8,R8 ;will execute loop 7 x • toldec R8 • jeq done • bit R9,R5 ;test bit of multiplier • jznxtbit ;jump if zero • add R6,R7 • nxbitclrc ;need to do rotate • rlc R9 • rlc R6 ;multiplicand x2 • jmptol • done mov R6,4(SP) ECE 3561 - Lecture 1
Time anlaysis • Before the loop • 2 mov using x(Rn) to Rm – 3 cycles • 1 clr instruction – 1 cycle • 1 mov immediate to Rm – 2 cycles • Total setup time • Total of 6 CPU cycles ECE 3561 - Lecture 1
Time analysis – the loop • Within the loop • 1 dec instruction – 1 cycle • 1 conditional jump – 2 cycles • NOTE: all jump instructions take 2 CPU cycles to execute regardless of whether jump is taken or not. • The BIT instruction (R,R) – 1 cycle • Conditional jump – 2 cycles • If jump not taken add instr (R,R) – 1 cycle • The clrc instruction – 1 cycle • 2 rlc instructions – 1 cycle • Unconditional jump – 2 cycles • Loop cycles = 10 + 1 when jump not taken • Total loop time = 7 * Loop cycles • Max loop time = 77 cycles Min loop time = 70 cycles • Total algorithm time – 6 + 77 = 83 cycles ECE 3561 - Lecture 1
Summary - Assignment • No new assignment. • But try coding up the loop. ECE 3561 - Lecture 1