CSC 3210 Computer Organization and Programming

CSC 3210Computer Organization and Programming Chapter 2 SPARC ARCHITECTURE D.M. Rasanjalee Himali

Outline • Introduction • Registers • SPARC Assembly Language Programming • Pipelining • The Debugger gdb • Filling Delay Slots • Branching • Control Statements

Introduction • The SPARC architecture is a load/store architecture. • All arithmetic and logical operations are carried out between operands located in registers. • Load and store instructions are provided to load and store register contents from memory. • The machine has 32 registers available to the programmer at any one time.

Registers • Registers provide for rapid, direct access in computation • These registers are logically divided into four sets: • Global (%g0 - %g7): • for global register data, data that have meaning to an entire program and are accessible from any function. • In (%i0 - %i7): • contain calling function arguments • Local (%l0 - %l7): • for local function variables and we will store our program variables in these registers. • Out(%o0 - %o7): • for use as temporaries, passing arguments to functions, and, obtaining returned values from functions. • Special Registers: %o6 ,%o7 , %g0 • All registers will store a signed integer n, -231<n < 231 or approximately |ni| < 109

Registers

SPARC Assembly Language Programming • The SPARC assembler, as is in effect a two-pass assembler. • First Pass: • Assembler updates the location counter as it processes machine statements, without paying attention to undefined labels that might be used as operands. • Whenever it sees a label followed by a colon (:) it defines the label symbol to have the value of the location counter. • Second Pass: • The program is then read a second time; • All the symbols and labels have been defined, and whenever a label is encountered its value is substituted for the symbol. • Labels followed by a colon are ignored.

SPARC Assembly Language Programming • Assembly language programs are line based: • Each statement typically specify a single instruction or data element. • Statements may be labeled: • Label: An identifier followed by a colon • Labels start at the beginning of a line and the instruction or data specification one tab stop in.

SPARC Assembly Language Programming • Comments: • Start at about the center of the line. • Commence with an exclamation point (!). • C-style comments may also be used, opening with a /* and closing with a */. These comments may extend over many lines. • Example:

SPARC Assembly Language Programming • Pseudo-Ops: • All machine instructions have mnemonics such as add and sub. • There are other statements that do not generate machine instructions calledpseudo-ops: • Ex: data definitions , statements that provide the assembler information. • Pseudo-ops generally start with a period. • Pseudo-op may be labeled. • .global pseudo-op define a label to be accessible outside the program in which it is defined. • Ex: Define the label main to be global:

SPARC Assembly Language Programming • We use the C compiler to call the assembler as and to load our program. • All C programs have a “.c” file name extension. • The C compiler produce the object files (with a “.o” extension). • Object files are the machine code corresponding to the C code for files. • Then C compiler calls the linker to combine all the object files with library routines to make an executable program. • This executable program is by default stored in a file called “a. out.”

SPARC Assembly Language Programming • Compiling a C program: • two-step process: • First, the compiler translates the C program into assembly language, placing the code in a file with a “.s” extension to indicate that it is assembly language. • The compiler then calls as to assemble this file to produce the “.o” file.

SPARC Assembly Language Programming • To see the assembly language for one of your C programs, call the compiler with the “-S” switch and it will produce only the “.s” assembly language file. • To have this assembled and made ready for execution, we would type: • This will assemble our program and place it in a file called expr ready for execution

SPARC Assembly Language Programming • The C compiler expects to start execution at an address main. • Thus, the label ‘main’ : • must appear in our program at the first statement we want executed, and • must be declared global by using the .global pseudo-op. • The first instruction to be executed should be: • The save instruction provides space to save our registers when the debugger is running.

SPARC Assembly Language Programming • Macros need to be expanded before we assemble our program • We write our program in a file with a .m extension, indicating that m4 must first be run to produce the .s file:

SPARC Assembly Language Programming • Ex: • Write a program to evaluate equation in chapter1

SPARC Assembly Language Programming • Most SPARC instructions take three operands: • two registers and a literal constant, or three registers: • The contents of the first source register regrs1 is combined with the literal or the contents of the second source register regrs2 to produce a result • Result is stored in the destination register regrd. • The contents of the source registers are unchanged. • A literal constant, c, must have the range -4096 <c < 4096.

clear a register to zero copy contents of one register to another combine the contents of the two source registers, or source register and literal, with the sum or difference going into the destination register second operand is subtracted from the first and placed in destination register SPARC Assembly Language Programming • Some other instructions:

SPARC Assembly Language Programming • Multiplication and Division: • SPARC architecture does not have a multiply or divide instruction. These operations are done by call instruction • To multiply: • To divide: • Result is placed in %o0 • Called function may use any of the first six out registers %o0-%o5 , possibly changing their contents • These registers are for temporary results, and their contents are not preserved over function calls

Pipelining • To achieve very fast execution, computers are pipelined. • Von Neumann cycle is broken up into its components parts. • For a RISC architecture the components are:

Pipelining • Sequential/ Non-Pipelined Execution: • If each component of the cycle takes one machine cycle, it will take four cycles to execute each instruction • Each component remains idle 75% of the time. • Pipelined Execution: • Each component is executed independently and concurrently • Ex: the instruction fetch component fetch the next instruction immediately after it has finished fetching the current instruction • The pipelined machine can execute one instruction every machine cycle, four times the rate of the non-pipelined machine

Pipelining • Problem 1: Load Delay • Ex: • When a load instruction is executed (load [ %o1)the data is not obtained until the end of the M cycle. • If the instruction (add %ol, %o2, %o2) attempts to use this data, it will obtain the prior contents of the register! • Machine detects this and waits a cycle to allow the data to be obtained • If you can insert an instruction between the load and the next instruction which uses the result of the load, no cycles are wasted.

Pipelining • Problem 2: Branch Delay • Occurs when a branch instruction is encountered , as a branch instruction changes the pc. • Unfortunately, the branch target address is not available until after the execution of the branch instruction, and this is not until after the following instruction has been fetched • Once again a cycle must be wasted. In this case, however, the machine does not insert a wait cycle but expects the programmer to insert some instruction that may be executed after the branch instruction. This is called a branch delay slot instruction.

Pipelining • It is frequently possible to place an instruction after the branch that can be usefully executed. • Programmer can use instructions following a branch by maintaining two program counters, %pc and %npc, the program counter and the next program counter. • The machine executes the instruction to which the %pc is pointing while at the same time fetching the instruction to which the %npc is pointing. • The instruction fetched is generally the one following the instruction being executed. • When a branch occurs, the instruction following the branch has already been fetched and will be executed.

Pipelining The left half and right half of the diagram execute simultaneously, with time running down the page.

Pipelining • Independent of what happens to the %npc, the instruction that was fetched before the branch is always executed. • When we call a function we are branching to another address in memory, and the instruction following the call instruction will be executed before the first instruction of the called function is executed. • The simplest thing to do following any branch instruction is to insert a nop. This is a mnemonic for “no operation” and is an instruction that does nothing to change the state of the machine:

Pipelining • We can now write our program to compute the expression for x = 9 given in Eq.1:

Pipelining • Trap Instruction: • The last two instructions in the program return us to the operating system. • The trap instruction ta calls the operating system with the service request encoded into register %gl. • A few of the traps are as follows:

Pipelining • Save the program expr.m and run through m4: • with the output redirected into expr.s, the following assembly code would be produced: • This could then be assembled and the executable output put into a file expr by : • If the program is then executed:

The Debugger gdb • A debugger is used to verify correctness, and to find bugs • The debugger gdb may also be used to execute a program, to stop execution at any point and to single-step execution • Having assembled the program, placing the output into expr as we did in the example above, gdb may be entered by typing: • To run the program in gdb, type “r”:

The Debugger gdb • A breakpoint may be set at any address • When computer is about to execute the instruction at which the breakpoint was set, it stops and returns to gdb, whereupon the program and its state of execution may be examined. • Typing “c” will tell gdb to continue execution from the breakpoint. • To set a breakpoint at a memory address, we need to type: • Ex: • The command “b” followed by a label sets a breakpoint at the instruction following the labeled instruction; gdb assumes the labeled instruction to be a save instruction :

The Debugger gdb • If we then run the program: • gdb informs us that we are at breakpoint 1, which should be the first instruction in our program. • The pc, will have the address of the instruction 0x106a8. • We can examine memory by typing “x” followed by an address: • The “i” format specifier states that the contents of the memory location should be interpreted as a machine instruction.

The Debugger gdb • In gdb all machine registers are referred to by a $ in place of the % used in as. • By typing a return we repeat the last command but with the address incremented by the size of the last data element typed out: • We may print the entire program by typing x/12i main, which will repeat the examine command 12 times:

The Debugger gdb • If we want to see whether the program ran correctly, we can set another break point at the trap instruction located at main+44: • We would then command gdb to continue execution by typing “c” (remember we are currently stopped at the first location in our program): • The program executes and stops at the last breakpoint we set. At this point the value should be stored in register %l1. To print the contents of a register, we use the print command “p”: • This tells us that the contents of register %l1 is -8, the correct value.

The Debugger gdb • What would happen if our program were incorrect and did not compute the correct value? • We could single-step the program starting at the beginning by typing “ni” for next machine instruction. • To do this at this point we would need to run the program again: • To know what instructions were being executed examine the memory location the %pc is pointing to:

The Debugger gdb • We have just executed the first instruction. • If we execute the second instruction, by typing ni, %l0 should contain the value 9: • “display” command, prints pc value every time a command is executed:

The Debugger gdb • We are now about to execute the call to .mul: • Note that the delay slot instruction is executed before the call to . Mul • To quit gdb and to return to the operating system:

Filling Delay Slots • The call instruction is called a delayed control transfer instruction. • A delayed transfer instruction changes the address from which future instructions will be fetched after the instruction following the delayed transfer instruction has been executed. • The instruction following the delayed control transfer instruction is called the delayed instruction and it is located in the delay slot. • Whenever a branch or call instruction is executed: • it changes the contents of %npc, not the %pc. • The instruction that follows the branching instruction will be executed before the branch or call happens. • By filling the delay slot with a nop instruction we have not accomplished very much; the pipeline machine wastes an instruction execution every time it branches. • However, we may move the instruction prior to the branch instruction into the delay slot.

Filling Delay Slots • In the following version of the program we have moved the sub instructions, which compute the final argument to .mul and .div into the delay slots, thereby eliminating the nop instructions. • The resulting code does not lose any cycles at all.

Filling Delay Slots • Assume that we are executing the mov 9, %lo instruction while at the same time fetching the sub %l0,1,%o0 instruction • Having fetched an instruction, we will execute it in the next cycle. • As the instruction executed was not a branch instruction, the next instruction following will be fetched.

Filling Delay Slots • Having fetched the call instruction, it will be executed in the next cycle. • As the instruction executed sub %l0, 1, %o0, was not a branch instruction, the next instruction following will be fetched. • The execution of the call instruction will cause the next instruction to be fetched from the first location labelled by .mul. • Having fetched the sub %l0, 7, %o1instruction, it will be executed. • Its execution occurs before the first instruction from . mul has even been fetched.

Filling Delay Slots • As the instruction executed, sub %l0, 7, %o1, was not a branching instruction, the next instruction following the instruction addressed by the %npc will be fetched while the instruction just fetched will be executed • Filling the delay slots in this manner makes reading the program more difficult, • but by filling the delay slots the resulting execution is faster and the size of the program smaller. • Care must be taken in filling delay slots to ensure that the algorithm is not changed. • In general, when we write assembly language programs we will be expected to fill all possible delay slots.

Branching • Branching is used in conjunction with testing • Testing: • The state of execution is saved in terms of four variables: • This information is kept in four variables, the integer condition codes: Z, N, V, and C.

Branching • Moving instructions around could eliminate empty delay slots. • This causes a problem when we wish to conditionally branch, based on the result of a prior instruction execution, if the instruction was not immediately executed before the branch instruction. • This problem is solved in the SPARC architecture by having a duplicate set of computational instructions, such as add and sub, which in addition to performing the arithmetic operation, set the condition codes. • These instructions have “cc” appended to the mnemonic, which indicates that the instruction is to set the condition codes Z, N, V, and to save the state of the instruction execution.

Branching • Like the call instruction • if the condition specified is met ,branch instructions branch to the specified label. • Branch instructions are delayed control transfer instructions such that the following instruction will be executed before the effect of the branch takes place. • The delay slot of a conditional branch instruction may not be filled with another branching instruction • Branch instructions test the condition codes in order to determine if the branching condition exists :

Branching • Ex: evaluate the expression in Chapter 1 for integer values of x from 0 up to 10 • Translated Assembly language program: • C program: • bl instruction is followed by a nop instruction in the delay slot. • We cannot fill the delay slot as we did in the case of the call instruction simply by moving the instruction immediately before the branch into the slot, as this statement sets the condition codes to be evaluated by the bl instruction.

Modified .s version: Branching • If it is possible to rearrange the code before the conditional branch statement: • We are now free to move the mov instruction into the delay slot.

Branching: • When we execute the program we will need to set a breakpoint at loop to print out the value of y: • This works well but involves a lot of typing

Branching: • We can program gdb to do this for us with the commands instruction • This instruction specifies a number of commands to be executed when the breakpoint is reached • its argument is the breakpoint at which the commands are to be executed. In our case it is breakpoint 2 (the first breakpoint is set at main).

Branching • This informs gdb that when it reaches breakpoint 2, it is to print out the contents of register %l1 and then to continue:

Control Statements • While: • The while loop causes some problems in assembly language. • Consider translating the following while statement into assembly language:

CSC 3210 Computer Organization and Programming