1 / 38

Generating a software loop with memory accesses

Generating a software loop with memory accesses. TigerSHARC assembly syntax. Concepts. Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code

dirk
Download Presentation

Generating a software loop with memory accesses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generating a software loop with memory accesses TigerSHARC assembly syntax

  2. Concepts • Learning just enough TigerSHARC assembly code to make a software loop “work” • Comparing the timings for rectification of integer and floating point arrays, using • debug C++ code, • Release C++ code • Our FIRST_ASM code • Looking in “MIXED mode” at the code generated by the compiler TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  3. Test Driven Development Work with customer to check that the tests properly express what the customer wants done. Iterative process with customer “heavily involved” – “Agile” methodology. CUSTOMER DEVELOPER TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  4. Note Special marker Compiler optimization FLOATS 927  304 -- THREE FOLD INTS 960  150 – SIX FOLD Why the difference, and can we do better, and do we want to? Note the failures – what are they TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  5. Write tests about passing values back from an assembly code routine TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  6. More detailed look at the code As with 68K and Blackfin needs a .section But name and format different As with 68K need .align statement Is the “4” in bytes (8 bits)or words (32 bits) As with 68K need .globalto tell other code that this function exists Single semi-colons Double semi-colons Start function label End function label Used for “profiling code” Label format similar to 68K Needs leading underscore and final colon TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  7. Return registers • There are many, depending on what you need to return • Here we need to use J8 as the return register to pass back “integer” pointer • Many registers available – need ability to control usage • J0 to J31 – registers (integers and pointers) (SISD mode) • XR0 to XR31 – registers (integers) (SISD mode) • XFR0 to XFR31 – registers (floats) (SISD mode) • Did I also mention • I0 to I31 – registers (integers and pointers) (SISD mode) • YR0 to YR31 , YFR0 to YFR31 (SIMD mode) • XYR, YXR and R registers (SIMD mode) • And also the MIMD modes • And the double registers and the quad registers ……. #define return_pt_J8 J8 // J8 is a VOLATILE, NON-PRESERVED register TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  8. Parameter passing • SPACES for first four parameters ARE ALWAYS present on the stack (as with 68K) • But the first four parameters are passed in registers (J4, J5, J6 and J7 most of the time) (as with MIPS and Blackfin) • The parameters passed in registers are often stored into the spaces on the stack (like the MIPS) as the first step when assembly code functions call assembly code functions • J4, J5, J6 and J7 are volatile, non-preserved registers TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  9. Can we pass back the start of the final array Still passing tests byaccident and this needs to be conditional returnvalue TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  10. What we need to know based on experiences from other processors • Can we return from an assembly language routine without crashing the processor? • Return a parameter from assembly language routine • (Is it same for ints and floats?) • Pass parameters into assembly language • (Is it same for ints and floats?) • Do IF THEN ELSE statements • Read and write values to memory • Read and write values in a loop • Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( ) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  11. Why is ELSE a keyword FOUR PART ELSE INSTRUCTION IS LEGAL IF JLT; ELSE, J1 = J2 + J3; // Conditional execution – if true ELSE, XR1 = XR2 + XR3; // Conditional – if true YFR1 = YFR2 + YFR3;; // Unconditional -- always IF JLT; DO, J1 = J2 + J3; // Conditional execution -- if true DO, XR1 = XR2 + XR3; // Conditional -- if true YFR1 = YFR2 + YFR3;; // Unconditional -- always Having this sort of format means that the instruction pipeline is not disrupted when we do IF statements TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  12. Label name is not the problem NOTE: This is “C-like” syntax, But it is not “C” Statement must end in ;; Not ; ONE semicolon = end of instructionTWO semicolons = end of parallel instruction line TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  13. Add dual-semicolons everywhereWorry about “multiple issues” later This dual semi-colon Is so important that you MUST code review for it all the time or else you waste so much time in the Lab. Key in exams / quizzes At last an error I know how to fix TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  14. Well I thought I understood it !!! • Speed issue – JUMP instructions can’t be too close together when stored in memory • Not normally a problem when “if” code is larger TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  15. Add a single instruction of 4 NOPsnop; nop; nop; nop;; TEMPORARY • Fix the last error as part of Assignment 1 Fix the remaining error In handling the IF THEN ELSE as part of assignment 1 Worry about code efficiency later (refactor) when all code working TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  16. What we need to know based on experiences from other processors • Can we return from an assembly language routine without crashing the processor? • Return a parameter from assembly language routine • (Is it same for ints and floats?) • Pass parameters into assembly language • (Is it same for ints and floats?) • Do IF THEN ELSE statements • Read and write values to memory • Read and write values in a loop • Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( ) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  17. Target. Changing this C++ code into assembly (to get “more” speed) • Code we generated yesterday was similar to parts of this, but not equivalent. • Re-factor the code to make the assembly code and C++ functionality equivalent TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  18. The code was not exactly what we designed (C++ equivalent) – re-factor and retest after the re-factoring NEXT STEP TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  19. Refactored C++ code I THINK I UNDERSTANDENOUGH TO CHANGE THEFORMAT OF THE IF-THEN-ELSE TO OPTIMIZE THIS PARTICULAR CODE BIT USE : IF TRUE EXECUTE THIS STATEMENT – SINGLE LINE Avoiding JUMPS in the mainflow of the code will speedthe flow of the code Almost right. SYNTAX ERROR Look in the manual to findthe correct syntax IF NJLE; DO, J8 = 0 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  20. No syntax errors (No CODE ERRORS). Code does not work (CODE DEFECTS) We don’t haveenough code topass all the testsbut we are failingtests we did notexpect to fail TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  21. Run “forensic tests” to find out where DEFECT is being introduced Identify mistake byremoving “codesections” Without the IF TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  22. Add another line to the codeCan now spot the error New format of IF-THEN-ELSE Is doing exactly the opposite of what we want IF NOT TRUE return NULL (0) Need JLE not NJLE TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  23. Assignment 1 – code the following as a software loop – follow MIPS / Blackfin approach DONE DURING TUTOTIAL int CalculateSum(void) { int sum = 0; for (int count = 0; count < 6; count++) { sum = sum + count; } return sum; } TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  24. Reminder – software for-loopbecomes “while loop” with initial test int CalculateSum(void) { int sum = 0; int count = 0; while (count < 6) { sum = sum + count; count++; } return sum; } Do line by line translation intoassembly code TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  25. USE SOFTWARE LOOP HEREDo loop control first • Have some jumps too close together NOTEJGE is ILLEGALUSE NJLT Customize?#define JGE NJLT TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  26. Run the tests with 4 nop padding to check that get out of loop as expected Adding 4 nops-- lose 1 cyclegain an hour not trying tosolve the problem If need the 1 cyclerefactor the code later TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  27. Accessing memory • Basic mode • Special register J31 – acts as zero when used in additions • Pt_J5 is a pointer register into an array • Value_J1 is being used as a data register • J registers like MIPS registers (used as pointer and data).NOT like 68K or Blackfin registers – those can be used as either data or address registers but not both • NOTE: Later we will find that using TigerSHARC registers for data operations is a BAD idea • Value_J1 = [Pt_J5];; read value from memory location pointed to by J5 -- Compare to Blackfin Value_R0 = [Pt_P0];; • Value_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Value_J1 = [Pt_J5];; -- NEED TO CONFIRM TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  28. Accessing memory – step 2 • Basic mode • Pt_J5 is a pointer register into an array • Offset_J4 is used as an offset • Value_J1 is being used as a data register to receive the memory value – load / store architecture • Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4) PRE-MODIFY – address used J5 + J4, no change in J5 • Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add operation on the J5 register (points to NEXT location) POST-MODIFY – address used J5, then perform J5 = J5 + J4 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  29. Add in the memory accessesFORGET TigerSHARC = RISC PROCESSOR LOAD/STORE ONLYLike MIPS and Blackfin Must place value intoregister, and then copyregister to memory NO [J5 +J0] = 0; NO J3 = 0;[J5 + J0] = J3; Uses wrong J3 – Remember TigerSHARCcan handle parallel instructions YESJ3 = 0;;[J5 + J0] = J3; TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  30. Understand the error messageToo many J resource usage = missing ;; Unintentionally doing theparallel instruction line [J5 + J0] = J2; J0 = J0 + 1;; TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  31. Note: Missing label is not an assembler error, it’s a linker error Fix warningsDEFECTmay be days before try to linkthen hard to find TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  32. NOW the assembler know where “CONTINUE” is, then it can tell you that you have two JUMP instructions too close together • Fix with magic 4 nops; and lose one cycle / loop TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  33. Not getting expected Test resultsSomething is logically wrong (DEFECT) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  34. Obvious question – are we even getting into the loop. Add BREAKPOINT to TEST code flow.(We don’t add BREAKPOINTS to code follow in detail) CODE NEVER GOT TOBREAKPOINT meanscode never entered loop Forgot to do count = 0 So not even getting into loop as there isa garbage value already inCount_J0 fromcode we executedearlier -- DEFECT TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  35. Not bad for a first effortFaster than compiler in debug mode TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  36. Where did the float ASM code suddenly appear from? • Integer 0 has bit pattern 0x0000 0000 • Float 0.0 has bit pattern 0x0000 0000 • Integer +6 has format b 0??? ???? ???? ???? ???? ???? ???? ???? • Float +6.0 has format b 0??? ???????? ???? ???? ???? ???? ???? • Integer -6 has format b 1??? ???? ???? ???? ???? ???? ???? ???? • Float -6.0 has format b 1??? ???????? ???? ???? ???? ???? ???? • Format’s are very different, but the sign bit is in the same place • Float algorithm - if S == 1 (negative) set to zero Otherwise leave unchanged – same as integer algorithm • Just re-use integer algorithm with a change of name EXPONENT TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  37. Final code – Float rectify code just has a different name TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

  38. What we NOW KNOW • Can we return from an assembly language routine without crashing the processor? • Return a parameter from assembly language routine • (Is it same for ints and floats?) • Pass parameters into assembly language • (Is it same for ints and floats?) • Do IF THEN ELSE statements • Read and write values to memory • Read and write values in a loop • Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( ) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada

More Related