380 likes | 471 Views
Generating a software loop with memory accesses. TigerSHARC assembly syntax. Concepts. Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code
E N D
Generating a software loop with memory accesses TigerSHARC assembly syntax
Concepts • Learning just enough TigerSHARC assembly code to make a software loop “work” • Comparing the timings for rectification of integer and floating point arrays, using • debug C++ code, • Release C++ code • Our FIRST_ASM code • Looking in “MIXED mode” at the code generated by the compiler TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Test Driven Development Work with customer to check that the tests properly express what the customer wants done. Iterative process with customer “heavily involved” – “Agile” methodology. CUSTOMER DEVELOPER TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Note Special marker Compiler optimization FLOATS 927 304 -- THREE FOLD INTS 960 150 – SIX FOLD Why the difference, and can we do better, and do we want to? Note the failures – what are they TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Write tests about passing values back from an assembly code routine TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
More detailed look at the code As with 68K and Blackfin needs a .section But name and format different As with 68K need .align statement Is the “4” in bytes (8 bits)or words (32 bits) As with 68K need .globalto tell other code that this function exists Single semi-colons Double semi-colons Start function label End function label Used for “profiling code” Label format similar to 68K Needs leading underscore and final colon TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Return registers • There are many, depending on what you need to return • Here we need to use J8 • Many registers available – need ability to control usage • J0 to J31 – registers (integers and pointers) (SISD mode) • XR0 to XR31 – registers (integers) (SISD mode) • XFR0 to XFR31 – registers (floats) (SISD mode) • Did I also mention • I0 to I31 – registers (integers and pointers) (SISD mode) • YR0 to YR31 , YFR0 to YFR31 (SIMD mode) • XYR, YXR and R registers (SIMD mode) • And also the MIMD modes • And the double registers and the quad registers ……. #define return_pt_J8 J8 // J8 is a VOLATILE, NON-PRESERVED register TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Parameter passing • Spaces for first four parameters ARE ALWAYS present on the stack (as with 68K) • But the first four parameters are passed in registers (J4, J5, J6 and J7 most of the time) (as with MIPS) • The parameters passed in registers are often stored into the spaces on the stack (like the MIPS) when assembly code functions call assembly code functions • J4, J5, J6 and J7 are volatile, non-preserved registers TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Can we pass back the start of the final array Still passing tests byaccident and this needs to be conditional returnvalue TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
What we need to know based on experiences from other processors • Can we return from an assembly language routine without crashing the processor? • Return a parameter from assembly language routine • (Is it same for ints and floats?) • Pass parameters into assembly language • (Is it same for ints and floats?) • Do IF THEN ELSE statements • Read and write values to memory • Read and write values in a loop • Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( ) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Why is ELSE a keyword FOUR PART ELSE INSTRUCTION IS LEGAL IF JLT; ELSE, J1 = J2 + J3; // Conditional execution – if true ELSE, XR1 = XR2 + XR3; // Conditional – if true YFR1 = YFR2 + YFR3;; // Unconditional -- always IF JLT; DO, J1 = J2 + J3; // Conditional execution -- if true DO, XR1 = XR2 + XR3; // Conditional -- if true YFR1 = YFR2 + YFR3;; // Unconditional -- always Having this sort of format means that the instruction pipeline is not disrupted when we do IF statements TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Label name is not the problem NOTE: This is “C-like” syntax, But it is not “C” Statement must end in ;; Not ; TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Add dual-semicolons everywhereWorry about “multiple issues” later This dual semi-colon Is so important that you MUST code review for it all the time or else you waste so much time in the Lab. Key in exams / quizzes At last an error I know how to fix TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Well I thought I understood it !!! • Speed issue – JUMPS can’t be too close together. • Not normally a problem when “if” is larger TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Add a single instruction of 4 NOPsnop; nop; nop; nop;; • Fix the last error as part of Assignment 1 Fix the remaining error In handling the IF THEN ELSE as part of assignment 1 Worry about code efficiency later (refactor) when all code working TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
What we need to know based on experiences from other processors • Can we return from an assembly language routine without crashing the processor? • Return a parameter from assembly language routine • (Is it same for ints and floats?) • Pass parameters into assembly language • (Is it same for ints and floats?) • Do IF THEN ELSE statements • Read and write values to memory • Read and write values in a loop • Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( ) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Target for this week. Changing this code into assembly (more speed) • Code we generated yesterday was similar to parts of this, but not equivalent. Refactor to make equivalent TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
The code was not exactly what we designed (C++ equivalent) – refactor and retest after the refactoring NEXT STEP TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Refactored C++ code I THINK I UNDERSTANDENOUGH TO CHANGE THEFORMAT OF THE IF-THEN-ELSE IN THIS CASE Avoiding JUMPS in the mainflow of the code will speedthe flow of the code Almost right. Look in the manual to findthe correct syntax IF NJLE; DO, J8 = 0 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
No syntax errors (No ERRORS). Code does not work (DEFECTS) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Run “forensic tests” to find out where DEFECT is being introduced TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Add another line to the codeCan now spot the error New format of IF-THEN-ELSE Is doing exactly the opposite of what we want Need JLE not NJLE TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Assignment 1 – code the following as a software loop – follow MIPS approach int CalculateSum(void) { int sum = 0; for (int count = 0; count < 6; count++) { sum = sum + count; } return sum; } TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Reminder – software for-loopbecomes “while loop” with initial test int CalculateSum(void) { int sum = 0; int count = 0; while (count < 6) { sum = sum + count; count++; } return sum; } Do line by line translation TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
USE SOFTWARE LOOP HEREDo loop control first • Have some jumps too close together TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Run the tests with 4 nop padding to check that get out of loop as expected TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Accessing memory • Basic mode • Special register J31 – acts as zero when used in additions • Pt_J5 is a pointer register into an array • Value_J1 is being used as a data register • J registers like MIPS registers (used as pointer and data).NOT like 68K or Blackfin registers – either data or address but not both • Value_J1 = [Pt_J5];; read value from memory location pointed to by J5 -- Compare to Blackfin Value_R0 = [Pt_P0];; • Value_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Value_J1 = [Pt_J5];; -- NEED TO CONFIRM TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Accessing memory – step 2 • Basic mode • Pt_J5 is a pointer register into an array • Offset_J4 is used as an offset • Value_J1 is being used as a data register • Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4) PRE-MODIFY – address used J5 + J4, no change in J5 • Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add POST-MODIFY – address used J5, then perform J5 = J5 + J4 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Add in the memory accessesFORGET TigerSHARC = RISC PROCESSOR LOAD/STORE ONLYLike MIPS Must place value intoregister, and then copyregister to memory TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Understand the error messageToo many J resource usage = missing ;; TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Note: Missing label is not an assembler error, it’s a linker error TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Now the assembler know where “CONTINUE” is, then it can tell you that you have two JUMP too close together • Fix with magic 4 nops; and lose one cycle TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Not getting expected Test resultsSomething is logically wrong (DEFECT) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Obvious question – are we even getting into the loop. Add BREAKPOINT to test (not to code follow) NEVER GOT TOBREAKPOINT meansnever entered loop Forgot to do count = 0 So not even getting into loop as there isa garbage value inCount_J0 TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Not bad for a first effortFaster than compiler in debug mode TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Where did the float ASM code suddenly appear from? • Integer 0 has bit pattern 0x0000 0000 • Float 0.0 has bit pattern 0x0000 0000 • Integer +6 has format b 0??? ???? ???? ???? ???? ???? ???? ???? • Float +6.0 has format b 0??? ???? ???? ???? ???? ???? ???? ???? • Integer -6 has format b 1??? ???? ???? ???? ???? ???? ???? ???? • Float -6.0 has format b 1??? ???? ???? ???? ???? ???? ???? ???? • Format’s are very different, but the sign bit is in the same place • Float algorithm - if S == 1 (negative) set to zero Otherwise leave unchanged – same as integer algorithm • Just re-use integer algorithm with a change of name EXPONENT TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
Final code – Float rectify code just has a different name TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada
What we NOW KNOW • Can we return from an assembly language routine without crashing the processor? • Return a parameter from assembly language routine • (Is it same for ints and floats?) • Pass parameters into assembly language • (Is it same for ints and floats?) • Do IF THEN ELSE statements • Read and write values to memory • Read and write values in a loop • Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( ) TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada