380 likes | 466 Views
Generating a software loop with memory accesses. TigerSHARC assembly syntax. Concepts. Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code
E N D
Generating a software loop with memory accesses TigerSHARC assembly syntax
Concepts • Learning just enough TigerSHARC assembly code to make a software loop “work” • Comparing the timings for rectification of integer and floating point arrays, using • debug C++ code, • Release C++ code • Our FIRST_ASM code • Looking in “MIXED mode” at the code generated by the compiler TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Test Driven Development Work with customer to check that the tests properly express what the customer wants done. Iterative process with customer “heavily involved” – “Agile” methodology. CUSTOMER DEVELOPER TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Note Special marker Compiler optimization FLOATS 927 304 -- THREE FOLD INTS 960 150 – SIX FOLD Why the difference, and can we do better, and do we want to? Note the failures – what are they TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Write tests about passing values back from an assembly code routine TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
More detailed look at the code As with 68K and Blackfin needs a .section But name and format different As with 68K need .align statement Is the “4” in bytes (8 bits)or words (32 bits) As with 68K need .globalto tell other code that this function exists Single semi-colons Double semi-colons Start function label End function label Used for “profiling code” Label format similar to 68K Needs leading underscore and final colon TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Return registers • There are many, depending on what you need to return • Here we need to use J8 as the return register to pass back “integer” pointer • Many registers available – need ability to control usage • J0 to J31 – registers (integers and pointers) (SISD mode) • XR0 to XR31 – registers (integers) (SISD mode) • XFR0 to XFR31 – registers (floats) (SISD mode) • Did I also mention • I0 to I31 – registers (integers and pointers) (SISD mode) • YR0 to YR31 , YFR0 to YFR31 (SIMD mode) • XYR, YXR and R registers (SIMD mode) • And also the MIMD modes • And the double registers and the quad registers ……. #define return_pt_J8 J8 // J8 is a VOLATILE, NON-PRESERVED register TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Parameter passing • SPACES for first four parameters ARE ALWAYS present on the stack (as with 68K) • But the first four parameters are passed in registers (J4, J5, J6 and J7 most of the time) (as with MIPS and Blackfin) • The parameters passed in registers are often stored into the spaces on the stack (like the MIPS) as the first step when assembly code functions call assembly code functions • J4, J5, J6 and J7 are volatile, non-preserved registers TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Can we pass back the start of the final array Still passing tests byaccident and this needs to be conditional returnvalue TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
What we need to know based on experiences from other processors • Can we return from an assembly language routine without crashing the processor? • Return a parameter from assembly language routine • (Is it same for ints and floats?) • Pass parameters into assembly language • (Is it same for ints and floats?) • Do IF THEN ELSE statements • Read and write values to memory • Read and write values in a loop • Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( ) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Why is ELSE a keyword FOUR PART ELSE INSTRUCTION IS LEGAL IF JLT; ELSE, J1 = J2 + J3; // Conditional execution – if true ELSE, XR1 = XR2 + XR3; // Conditional – if true YFR1 = YFR2 + YFR3;; // Unconditional -- always IF JLT; DO, J1 = J2 + J3; // Conditional execution -- if true DO, XR1 = XR2 + XR3; // Conditional -- if true YFR1 = YFR2 + YFR3;; // Unconditional -- always Having this sort of format means that the instruction pipeline is not disrupted when we do IF statements TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Label name is not the problem NOTE: This is “C-like” syntax, But it is not “C” Statement must end in ;; Not ; ONE semicolon = end of instructionTWO semicolons = end of parallel instruction line TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Add dual-semicolons everywhereWorry about “multiple issues” later This dual semi-colon Is so important that you MUST code review for it all the time or else you waste so much time in the Lab. Key in exams / quizzes At last an error I know how to fix TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Well I thought I understood it !!! • Speed issue – JUMP instructions can’t be too close together when stored in memory • Not normally a problem when “if” code is larger TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Add a single instruction of 4 NOPsnop; nop; nop; nop;; TEMPORARY • Fix the last error as part of Assignment 1 Fix the remaining error In handling the IF THEN ELSE as part of assignment 1 Worry about code efficiency later (refactor) when all code working TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
What we need to know based on experiences from other processors • Can we return from an assembly language routine without crashing the processor? • Return a parameter from assembly language routine • (Is it same for ints and floats?) • Pass parameters into assembly language • (Is it same for ints and floats?) • Do IF THEN ELSE statements • Read and write values to memory • Read and write values in a loop • Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( ) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Target. Changing this C++ code into assembly (to get “more” speed) • Code we generated yesterday was similar to parts of this, but not equivalent. • Re-factor the code to make the assembly code and C++ functionality equivalent TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
The code was not exactly what we designed (C++ equivalent) – re-factor and retest after the re-factoring NEXT STEP TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Refactored C++ code I THINK I UNDERSTANDENOUGH TO CHANGE THEFORMAT OF THE IF-THEN-ELSE TO OPTIMIZE THIS PARTICULAR CODE BIT USE : IF TRUE EXECUTE THIS STATEMENT – SINGLE LINE Avoiding JUMPS in the mainflow of the code will speedthe flow of the code Almost right. SYNTAX ERROR Look in the manual to findthe correct syntax IF NJLE; DO, J8 = 0 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
No syntax errors (No CODE ERRORS). Code does not work (CODE DEFECTS) We don’t haveenough code topass all the testsbut we are failingtests we did notexpect to fail TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Run “forensic tests” to find out where DEFECT is being introduced Identify mistake byremoving “codesections” Without the IF TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Add another line to the codeCan now spot the error New format of IF-THEN-ELSE Is doing exactly the opposite of what we want IF NOT TRUE return NULL (0) Need JLE not NJLE TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Assignment 1 – code the following as a software loop – follow MIPS / Blackfin approach DONE DURING TUTOTIAL int CalculateSum(void) { int sum = 0; for (int count = 0; count < 6; count++) { sum = sum + count; } return sum; } TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Reminder – software for-loopbecomes “while loop” with initial test int CalculateSum(void) { int sum = 0; int count = 0; while (count < 6) { sum = sum + count; count++; } return sum; } Do line by line translation intoassembly code TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
USE SOFTWARE LOOP HEREDo loop control first • Have some jumps too close together NOTEJGE is ILLEGALUSE NJLT Customize?#define JGE NJLT TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Run the tests with 4 nop padding to check that get out of loop as expected Adding 4 nops-- lose 1 cyclegain an hour not trying tosolve the problem If need the 1 cyclerefactor the code later TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Accessing memory • Basic mode • Special register J31 – acts as zero when used in additions • Pt_J5 is a pointer register into an array • Value_J1 is being used as a data register • J registers like MIPS registers (used as pointer and data).NOT like 68K or Blackfin registers – those can be used as either data or address registers but not both • NOTE: Later we will find that using TigerSHARC registers for data operations is a BAD idea • Value_J1 = [Pt_J5];; read value from memory location pointed to by J5 -- Compare to Blackfin Value_R0 = [Pt_P0];; • Value_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Value_J1 = [Pt_J5];; -- NEED TO CONFIRM TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Accessing memory – step 2 • Basic mode • Pt_J5 is a pointer register into an array • Offset_J4 is used as an offset • Value_J1 is being used as a data register to receive the memory value – load / store architecture • Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4) PRE-MODIFY – address used J5 + J4, no change in J5 • Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add operation on the J5 register (points to NEXT location) POST-MODIFY – address used J5, then perform J5 = J5 + J4 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Add in the memory accessesFORGET TigerSHARC = RISC PROCESSOR LOAD/STORE ONLYLike MIPS and Blackfin Must place value intoregister, and then copyregister to memory NO [J5 +J0] = 0; NO J3 = 0;[J5 + J0] = J3; Uses wrong J3 – Remember TigerSHARCcan handle parallel instructions YESJ3 = 0;;[J5 + J0] = J3; TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Understand the error messageToo many J resource usage = missing ;; Unintentionally doing theparallel instruction line [J5 + J0] = J2; J0 = J0 + 1;; TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Note: Missing label is not an assembler error, it’s a linker error Fix warningsDEFECTmay be days before try to linkthen hard to find TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
NOW the assembler know where “CONTINUE” is, then it can tell you that you have two JUMP instructions too close together • Fix with magic 4 nops; and lose one cycle / loop TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Not getting expected Test resultsSomething is logically wrong (DEFECT) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Obvious question – are we even getting into the loop. Add BREAKPOINT to TEST code flow.(We don’t add BREAKPOINTS to code follow in detail) CODE NEVER GOT TOBREAKPOINT meanscode never entered loop Forgot to do count = 0 So not even getting into loop as there isa garbage value already inCount_J0 fromcode we executedearlier -- DEFECT TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Not bad for a first effortFaster than compiler in debug mode TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Where did the float ASM code suddenly appear from? • Integer 0 has bit pattern 0x0000 0000 • Float 0.0 has bit pattern 0x0000 0000 • Integer +6 has format b 0??? ???? ???? ???? ???? ???? ???? ???? • Float +6.0 has format b 0??? ???????? ???? ???? ???? ???? ???? • Integer -6 has format b 1??? ???? ???? ???? ???? ???? ???? ???? • Float -6.0 has format b 1??? ???????? ???? ???? ???? ???? ???? • Format’s are very different, but the sign bit is in the same place • Float algorithm - if S == 1 (negative) set to zero Otherwise leave unchanged – same as integer algorithm • Just re-use integer algorithm with a change of name EXPONENT TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Final code – Float rectify code just has a different name TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
What we NOW KNOW • Can we return from an assembly language routine without crashing the processor? • Return a parameter from assembly language routine • (Is it same for ints and floats?) • Pass parameters into assembly language • (Is it same for ints and floats?) • Do IF THEN ELSE statements • Read and write values to memory • Read and write values in a loop • Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( ) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada