230 likes | 312 Views
Computer Architecture and System Programming Laboratory. TA Session 7 x87 FPU. x87 Floating-Point Unit (FPU) provides high-performance floating-point processing capabilities floating-point, integer, and packed BCD integer data types floating-point processing algorithms exception handling
E N D
Computer Architecture and System Programming Laboratory TA Session 7 x87 FPU
x87 Floating-Point Unit (FPU) provides high-performance floating-point processing capabilities • floating-point, integer, and packed BCD integer data types • floating-point processing algorithms • exception handling • IEEE Standard 754 http://home.agh.edu.pl/~amrozek/x87.pdf
x87 FPU represents a separate execution environment, consists of 8 data registers and the following special-purpose registers Value loaded from memory into x87 FPU data register is automatically converted into double extended-precision floating-point format
x87 FPU instructions treat the eight x87 FPU data registers as a register stack The register number of the current top-of-stack register is stored in the TOP (stack TOP) field in the x87 FPU status word. Load operations decrement TOP by one and load a value into the new top-of-stack register, and store operations store the value from the current TOP register in memory and then increment TOP by one
16-bit x87 FPU status register indicates the current state of the x87 FPU 16-bit tag word indicates the contents of each the 8 registers in the x87 FPU data-register stack (one 2-bit tag per register). Each tag in tag word corresponds to a physical register. TOP pointer is used to associate tags with registers relative to ST(0).
var1: dt 5.6 var2: dt 2.4 var3: dt 3.8 var4: dt 10.3 fldtword [var1] ; st0 = 5.6, TOP=4 fmultword [var2] ; st0=st0*2.4=13.44, TOP=4 fldtword [var3] ; st0=3.8, st1=13.44, TOP=3 fmultword [var4] ; st0=st0*10.3=39.14, st1=13.44, TOP=3 fadd st1 ; st0=st0+st1, st1=13.44, TOP=3 gdb command to see stack data registers: tui reg float
x87 FPU recognizes and operates on the following seven data types: single-precision floating point, double-precision floating point, double extended-precision floating point, signed word integer, signed doubleword integer, signed quadword integer, and packed BCD decimal integers.
IEEE 754 standard RAM integer number in memory Example: mov tword [n], 9 fild tword [n] sign bit = 0 exponent = 11 significand = 1.001 float-point number in x87 data registers stack
FPU INSTRUCTION SET x87 FPU instruction set fall into ESC instructions. They have a common opcode format, where the first byte of the opcode is one of the numbers from D8H through DFH. push commonly used constants onto st0
Basic Arithmetic Instructions Example of reverse instruction: Operands in memory can be in single-precision floating-point, double-precision floating-point, word-integer, or doubleword-integer format. They are converted to double extended-precision floating-point format automatically. The pop versions of instructions offer the option of popping the x87 FPU register stack following the arithmetic operation. These instructions operate on values in the ST(i) and ST(0) registers, store the result in the ST(i) register, and pop the ST(0) register.
Control Instructions FINIT/FNINIT instructions initialize the x87 FPU and its internal registers to default values. Stack overflow and underflow exceptions Stack overflow — an instruction attempts to load a non-empty x87 FPU register from memory. A non-empty register is defined as a register containing a zero (tag value of 01), a valid value (tag value of 00), or a special value (tag value of 10). Stack underflow — an instruction references an empty x87 FPU register as a source operand, including attempting to write the contents of an empty register to memory. An empty register has a tag value of 11.
Magic square http://www.1728.org/magicsq1.htm For the 3 x 3 magic square, each row, each column and both diagonals would sum to3 • (3² + 1) ÷ 2 = 15 1) '1' goes in the middle of the top row 2) All numbers are then placed one column to the right and one row up from the previous number. 3) Whenever the next number placement is above the top row, stay in that column and place the number in the bottom row. 4) Whenever the next number placement is outside of the rightmost column, stay in that row and place the number in the leftmost column. 5) When encountering a filled-in square, place the next number directly below the previous number. 6) When the next number position is outside both a row and a column, place the number directly beneath the previous number.
section .data fs_usage: db "Call with single, positive, odd number", 10, 0 fs_malloc_failed: db "A call to malloc() failed", 10, 0 fs_long: db "%*ld", 0 fs_newline: db 10, 0 section .bss argv: resq 1 n: resq 1 n2: resq 1 a: resq 1 b: resq 1 table: resq 1 width: resq 1
extern printf, atoi, calloc global main section .text main: enter 0, 0 finit ;FINIT instruction initialize the x87 FPU and its internal registers to default values. The x87 FPU tag word is set to FFFFH, which marks all the x87 FPU data registers as empty. mov qword [argv], rsi cmp rdi, 2 ; argc jne .error mov rdi, qword [argv] mov rdi, qword [rdi + 8*1] ; argv[1] call atoi cmp rax, 2 jle .error ; test rax, 1 tests whether the number is odd. The equivalent would be to do and rax, 1, but this would change rax. test rax, 1 jz .error
mov qword [n], rax mov rdi, rax mov rsi, 8 call calloc cmp rax, 0 je .malloc_failed mov qword [table], rax mov rdx, rax mov rax, 0 mov rbx, qword [n] .allocate_table: cmp rax, rbx ; check if reach end of table je .fill_table ; if yes, finish allocation and start filling the table mov rdi, rbx mov rsi, 8 ;gdb changes this line to be “mov esi, 8” push rax push rbx. push rdx call calloc ; allocate a single row of the table pop rdx mov qword [rdx], rax pop rbx pop rax add rdx, 8 add rax, 1 jmp .allocate_table
.fill_table_loop: cmp r8, r10 ; i == n^2 jg .fill_table_done movrdi, qword [table] ; rdi = pointer to table movrdi, qword [rdi + 8 * rbx] ; rdi = pointer to row[rbx] of the table (row 0, then row 1, and then row 2) mov qword [rdi + 8 * rcx], r8 inc r8 ; r8 = 1,2,3,... lea rax, [rbx + r9 - 1] cdq div r9 movrbx, rdx ` lea rax, [rcx + 1] cdq div r9 movrcx, rdx movrdi, qword [table] movrdi, qword [rdi + 8 * rbx] cmp qword [rdi + 8 * rcx], 0 je .fill_table_loop lea rax, [rbx + 2] cdq div r9 movrbx, rdx lea rax, [rcx + r9 - 1] cdq div r9 movrcx, rdx jmp .fill_table_loop .fill_table: movrbx, 0 ; a = 0 mov r9, qword [n] ; n movrcx, r9 shrrcx, 1 ; b = n / 2 mov r8, 1 ; i movrax, r9 cdq mulrax mov r10, rax ; n^2
fild qword [n] ; FILD (load integer) instruction converts an integer operand in memory into double extended-precision floating-point format and pushes the value onto the top of the register stack. fld st0 ; FLD (load floating point) instruction pushes a floating-point operand from memory onto the top of the x87 FPU data-register stack. fmulp ; Multiply floating point and pop ST(0) from the register stack fxtract ; Extract exponent and significand - put significand in ST(0), and exponent in ST(1) (in binary basic 2) fld1 ; Load +1.0 into ST(0) fxch ; If no source operand is specified, the contents of ST(0) and ST(1) are exchanged fyl2x ; FYL2X instruction computes (y * log2x) ; Replace ST(1) with (ST(1) ∗ log2ST(0)) and pop the register stack. faddp ; Add ST(0) to ST(1), store result in ST(1), and pop the register stack. fldl2t ; Push log210 onto the FPU register stack. fdivp ; Divide ST(1) by ST(0), store result in ST(1), and pop the register stack. ; Indeed we would like to calculate log10x, and not log2x
jmp .continue_voodoo .voodoo: dq 1.5; add 1.5 to st0, and store at the label width the closest integer to st0 (i.e., rounding it), and pop off the stack .continue_voodoo: fld qword [.voodoo] faddp ; Add ST(0) to ST(1), store result in ST(1), and pop the register stack. ; ST(0)+1.5 fistp qword [width] ; Store ST(0) in m64int and pop register stack. ; Indeed, this rounds the value of ‘width’ because it converts it to integer value
;;; PRINT THE MAGIC SQUARE movrbx, 0 .outer_loop: cmprbx, qword [n] je .end movrcx, 0 .inner_loop: cmprcx, qword [n] je .end_inner_loop movrdi, fs_long movrsi, qword [width] movrdx, qword [table] movrdx, qword [rdx + 8 * rbx] movrdx, qword [rdx + 8 * rcx] movrax, 0 push rbx push rcx call printf pop rcx pop rbx inc rcx jmp .inner_loop .end_inner_loop: movrdi, fs_newline movrax, 0 push rbx call printf pop rbx inc rbx jmp .outer_loop error: movrdi, fs_usage movrax, 0 call printf jmp .end .malloc_failed: movrdi, fs_malloc_failed movrax, 0 call printf .end: leave ret