430 likes | 703 Views
Lecture 5 Assembly Language. CSCE 212 Computer Architecture. Topics Assembly Language Lab 2 -. January 27, 2011. Overview. Last Time Covered through slides 11… of Lecture 4 Floating point: Review, rounding to even, multiplication, addition Compilation steps New
E N D
Lecture 5Assembly Language CSCE 212 Computer Architecture Topics • Assembly Language • Lab 2 - January 27, 2011
Overview Last Time • Covered through slides 11… of Lecture 4 • Floating point: Review, rounding to even, multiplication, addition • Compilation steps New • Architecture (Fred Brooks): Assembly Programmer’s View Address Modes • Swap Next Time: • Lab02 - Datalab
Pop Quiz - denormals • What is the representation of the largest denormalized IEEE float (in binary)? • Denormal expField = 0000 0000 • Largest denormal all frac bits are 1, ie., frac =111 1111 …1111 • Largest denormal representation = 0 0000 0000 111 ….1 • In hex? 0x007FFFF • What is its value as an expression, i.e., (-1)sign m * 2exp • Largest denormal’s value = 0.111 1111 … 1111 x 2-BIAS+2 • How many floats are there between 1.0 and 2.0?
What is a/the representation of minus infinity? • expField=0xFF, sign bit =1, frac=0x000000 (23 zeroes) • -infinity = 0xFF80 0000 • In C are there more ints or doubles? • #doubles = (distinct exp)*(number of doubles with same exp) • #doubles = (211 – 2)*(252) = 263 – 253 (note this does not count NaN or +/- infinity as a double (This only counts positives? Ignores 0) • In Math are there more rationals than integers ? • Argument for No: the sets have the same cardinality. There are both countably infinite, where the Reals are uncountable. • Argument for yes: every integer is a rational and ½ is a rational that is not an integer. • So actually the way the question is worded what is the best answer? • Extra credit for pop quiz 1: what is aleph-0? • http://mathworld.wolfram.com/Aleph-0.html
Pop Quiz – FP multiplication • If x=1.5 what is the representation of x as a float (in hex) • And if y=(2-ε)*237 Note 1.111…1 =(2-ε) • Then what is the frac field of the float z = x*y • And what is the exponent (not the exponent field) of z? • What is the largest gap between consecutive floats? • Note 1.11…11 X_______1.1__ 111…11 (24 bits) 111…11 (24 bits) --------------------- 10.11…101 (26 bits?)
Printf conversion specifications Examples Start specification Flags Minimum field width conversion type Size modifier Precision Figure taken from page 368 of “C a Reference Manual” by Harbison and Steele
CYGWIN Unix Emulation under Windows • Provides a bash window • .bash_profile Others CH, … Other Direction: Wine Downloading CYGWIN • Google CYGWIN • startxwin – run a windows emulator under GYGWIN Virtual Machines: Virtual Box, VMware
Homework 1 problem 2.90 • /usr/include/math.h • # define M_PI 3.14159265358979323846 /* pi */ • Hex rep given in problem pi = 0x40490fdb • Binary rep 0100 0000 0100 1001 0000 1111 1101 1011 • sign +, ExpField = 100 0000 0 Exp = 128-BIAS = 1 • Binary val = 1. 100 1001 0000 1111 1101 1011 * 21, • Now from the calculator 22/7= 3 + 1/7 • =3 + 1428571428571428571428571428571 = 3.(142857) • And .(14285710) = (.249016) = (.0010 0100 1001 00002) = .(0012) • Pi = 1. 100 1001 0000 1111 1101 1011 * 21, • 22/7= 1. 100 1001 0010 0100 0000 1001 0010 0100 00* 21,
Setting Variables and aliases in .bash_profile PATH=$HOME/bin:${PATH:-/usr/bin:.} PATH=$PATH:/usr/local/simplescalar/bin:/usr/local/simplescalar/simplesim-3.0 # list of directories separated by colons, used to specify where to find commands export PATH PS1="`hostname`> w5=/class/csce574-001/web/ w=/class/csce212-501/Code/ alias h=history alias lsl="ls -lrt | grep ^d" # later you can use the variables in commands like “ls $w”
Intel Registers figure 3.2 Intel microprocessor evolution 4004 8008 8080 8086 80x86 Backward compatibilty Registers of 8080 A: AH – AL C: CH -- CL D: DH – DL B: BH -- BL Si,di,sp,bp
Homework page 105 of text • 2.56 • 2.57 • 2.58 give hex representations and the value as an expression of the form 1.x-1x-2…x-n * 2 exp
2.56 Fill in the return value for the following procedure that tests whether its first argument is greater than or equal to its second. Assume the function f2u return an unsigned 32-bit number having the same bit representation as its floating-point argument. You can assume that neither argument is NaN. The two flavors of zero: +0 and -0 are considered equal. int float-qe(float x, float y){ unsigned ux = f2u(x); unsigned uy = f2u(y); /* Get the sign bits */ unsigned sx = ux >> 31; unsiqnedsy = uu >> 31; /* Give an expression using only ux, uy, sx and sy */ return /* … */ ; }
2.57 • Given a floating point format with a k-bit exponent and an n-bit fraction, write formulas for the exponent E, significand M, the fraction f, and the value V for the quantities that follow. In addition, describe the bit representation. • The number 5.0. • The largest odd integer that can be represented exactly. • The reciprocal of the smallest positive normalized value.
2.58’ - changed table columns • Intel-compatible processors also support an "extended precision" floating-point format with an 80-bit word divided into a sign bit, k = 15 exponent bits, a single integer bit, and n = 63 fraction bits. The integer bit is an explicit copy of the implied bit in the IEEE, floating-point representation. That is, it equals 1- for normalized values and 0 for denormalized values. Fill in the following table giving the appropri-ate values of some "interesting" numbers in this format:
New Species: IA64 Name Date Transistors Itanium 2001 10M • Extends to IA64, a 64-bit architecture • Radically new instruction set designed for high performance • Will be able to run existing IA32 programs • On-board “x86 engine” • Joint project with Hewlett-Packard Itanium 2 2002 221M • Big performance boost
Programmer-Visible State EIP Program Counter Address of next instruction Register File Heavily used program data Condition Codes Store status information about most recent arithmetic operation Used for conditional branching Memory Byte addressable array Code, user data, (some) OS data Includes stack used to support procedures Assembly Programmer’s View CPU Memory Addresses Registers E I P Object Code Program Data OS Data Data Condition Codes Instructions Stack
%eax %edx %ecx %ebx %esi %edi %esp %ebp Moving Data Moving Data movlSource,Dest: • Move 4-byte (“long”) word • Lots of these in typical code Operand Types • Immediate: Constant integer data • Like C constant, but prefixed with ‘$’ • E.g., $0x400, $-533 • Encoded with 1, 2, or 4 bytes • Register: One of 8 integer registers • But %esp and %ebp reserved for special use • Others have special uses for particular instructions • Memory: 4 consecutive bytes of memory • Various “address modes”
Simple Addressing Modes Normal (R) Mem[Reg[R]] • Register R specifies memory address movl (%ecx),%eax Displacement D(R) Mem[Reg[R]+D] • Register R specifies start of memory region • Constant displacement D specifies offset movl 8(%ebp),%edx
Using Simple Addressing Modes swap: pushl %ebp movl %esp,%ebp pushl %ebx movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret Set Up void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } Body Finish
• • • Offset 12 yp 8 xp 4 Rtn adr %ebp 0 Old %ebp -4 Old %ebx Understanding Swap void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } Stack Register Variable %ecx yp %edx xp %eax t1 %ebx t0 movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
Address %eax %edx %ecx %ebx %esi %edi %esp %ebp 0x104 Understanding Swap 123 0x124 456 0x120 0x11c 0x118 Offset 0x114 yp 12 0x120 0x110 xp 8 0x124 0x10c 4 Rtn adr 0x108 0 %ebp 0x104 -4 0x100 movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
Address %eax %edx %ecx 0x120 %ebx %esi %edi %esp 0x104 %ebp Understanding Swap 123 0x124 456 0x120 0x11c 0x118 Offset 0x114 yp 12 0x120 0x110 xp 8 0x124 0x10c 4 Rtn adr 0x108 0 %ebp 0x104 -4 0x100 movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
Address %eax 0x124 %edx %ecx 0x120 %ebx %esi %edi %esp 0x104 %ebp Understanding Swap 123 0x124 456 0x120 0x11c 0x118 Offset 0x114 yp 12 0x120 0x110 xp 8 0x124 0x10c 4 Rtn adr 0x108 0 %ebp 0x104 -4 0x100 movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
Address 456 %eax 0x124 %edx %ecx 0x120 %ebx %esi %edi %esp 0x104 %ebp Understanding Swap 123 0x124 456 0x120 0x11c 0x118 Offset 0x114 yp 12 0x120 0x110 xp 8 0x124 0x10c 4 Rtn adr 0x108 0 %ebp 0x104 -4 0x100 movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
Address 456 %eax 0x124 %edx 0x120 %ecx %ebx 123 %esi %edi %esp 0x104 %ebp Understanding Swap 123 0x124 456 0x120 0x11c 0x118 Offset 0x114 yp 12 0x120 0x110 xp 8 0x124 0x10c 4 Rtn adr 0x108 0 %ebp 0x104 -4 0x100 movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
Address 456 %eax 0x124 %edx 0x120 %ecx %ebx 123 %esi %edi %esp 0x104 %ebp Understanding Swap 456 0x124 456 0x120 0x11c 0x118 Offset 0x114 yp 12 0x120 0x110 xp 8 0x124 0x10c 4 Rtn adr 0x108 0 %ebp 0x104 -4 0x100 movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
Address 456 %eax 0x124 %edx 0x120 %ecx %ebx 123 %esi %edi %esp 0x104 %ebp Understanding Swap 456 0x124 123 0x120 0x11c 0x118 Offset 0x114 yp 12 0x120 0x110 xp 8 0x124 0x10c 4 Rtn adr 0x108 0 %ebp 0x104 -4 0x100 movl 12(%ebp),%ecx # ecx = yp movl 8(%ebp),%edx # edx = xp movl (%ecx),%eax # eax = *yp (t1) movl (%edx),%ebx # ebx = *xp (t0) movl %eax,(%edx) # *xp = eax movl %ebx,(%ecx) # *yp = ebx
Indexed Addressing Modes Most General Form D(Rb,Ri,S) Refers to Address Mem[Reg[Rb]+S*Reg[Ri]+ D] • D: Constant “displacement” 1, 2, or 4 bytes • Rb: Base register: Any of 8 integer registers • Ri: Index register: Any, except for %esp • Unlikely you’d use %ebp, either • S: Scale: 1, 2, 4, or 8 Special Cases • (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] • D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] • (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
Address Computation Examples %edx 0xf000 %ecx 0x100
Address Computation Instruction lealSrc,Dest • Src is address mode expression • Set Dest to address denoted by expression Uses • Computing address without doing memory reference • E.g., translation of p = &x[i]; • Computing arithmetic expressions of the form x + k*y • k = 1, 2, 4, or 8.
Some Arithmetic Operations Format Computation Two Operand Instructions addl Src,DestDest = Dest + Src subl Src,DestDest = Dest - Src imullSrc,DestDest = Dest * Src sall Src,DestDest = Dest << Src Also called shll sarl Src,DestDest = Dest >> Src Arithmetic shrl Src,DestDest = Dest >> Src Logical xorl Src,DestDest = Dest ^ Src andl Src,DestDest = Dest & Src orl Src,DestDest = Dest | Src
Some Arithmetic Operations Format Computation One Operand Instructions inclDestDest = Dest + 1 declDestDest = Dest - 1 neglDestDest = - Dest notlDestDest = ~ Dest
Using leal for Arithmetic Expressions arith: pushl %ebp movl %esp,%ebp movl 8(%ebp),%eax movl 12(%ebp),%edx leal (%edx,%eax),%ecx leal (%edx,%edx,2),%edx sall $4,%edx addl 16(%ebp),%ecx leal 4(%edx,%eax),%eax imull %ecx,%eax movl %ebp,%esp popl %ebp ret Set Up int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; } Body Finish
• • • Stack Offset 16 z 12 y 8 x 4 Rtn adr %ebp 0 Old %ebp Understanding arith int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; } movl 8(%ebp),%eax # eax = x movl 12(%ebp),%edx # edx = y leal (%edx,%eax),%ecx # ecx = x+y (t1) leal (%edx,%edx,2),%edx # edx = 3*y sall $4,%edx # edx = 48*y (t4) addl 16(%ebp),%ecx # ecx = z+t1 (t2) leal 4(%edx,%eax),%eax # eax = 4+t4+x (t5) imull %ecx,%eax # eax = t5*t2 (rval)
Understanding arith # eax = x movl 8(%ebp),%eax # edx = y movl 12(%ebp),%edx # ecx = x+y (t1) leal (%edx,%eax),%ecx # edx = 3*y leal (%edx,%edx,2),%edx # edx = 48*y (t4) sall $4,%edx # ecx = z+t1 (t2) addl 16(%ebp),%ecx # eax = 4+t4+x (t5) leal 4(%edx,%eax),%eax # eax = t5*t2 (rval) imull %ecx,%eax int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; }
Another Example logical: pushl %ebp movl %esp,%ebp movl 8(%ebp),%eax xorl 12(%ebp),%eax sarl $17,%eax andl $8185,%eax movl %ebp,%esp popl %ebp ret Set Up int logical(int x, int y) { int t1 = x^y; int t2 = t1 >> 17; int mask = (1<<13) - 7; int rval = t2 & mask; return rval; } Body Finish 213 = 8192, 213 – 7 = 8185 movl 8(%ebp),%eax eax = x xorl 12(%ebp),%eax eax = x^y (t1) sarl $17,%eax eax = t1>>17 (t2) andl $8185,%eax eax = t2 & 8185
CISC Properties Instruction can reference different operand types • Immediate, register, memory Arithmetic operations can read/write memory Memory reference can involve complex computation • Rb + S*Ri + D • Useful for arithmetic expressions, too Instructions can have varying lengths • IA32 instructions can range from 1 to 15 bytes
mem proc mem regs alu Stack Cond. Codes processor Summary: Abstract Machines Machine Models Data Control C 1) loops 2) conditionals 3) switch 4) Proc. call 5) Proc. return 1) char 2) int, float 3) double 4) struct, array 5) pointer Assembly 1) byte 2) 2-byte word 3) 4-byte long word 4) contiguous byte allocation 5) address of initial byte 3) branch/jump 4) call 5) ret
Pentium Pro (P6) History • Announced in Feb. ‘95 • Basis for Pentium II, Pentium III, and Celeron processors • Pentium 4 similar idea, but different details Features • Dynamically translates instructions to more regular format • Very wide, but simple instructions • Executes operations in parallel • Up to 5 at once • Very deep pipeline • 12–18 cycle latency
PentiumPro Block Diagram Microprocessor Report 2/16/95
PentiumPro Operation Translates instructions dynamically into “Uops” • 118 bits wide • Holds operation, two sources, and destination Executes Uops with “Out of Order” engine • Uop executed when • Operands available • Functional unit available • Execution controlled by “Reservation Stations” • Keeps track of data dependencies between uops • Allocates resources Consequences • Indirect relationship between IA32 code & what actually gets executed • Tricky to predict / optimize performance at assembly level
Whose Assembler? Intel/Microsoft Format GAS/Gnu Format Intel/Microsoft Differs from GAS • Operands listed in opposite order mov Dest, Src movl Src, Dest • Constants not preceded by ‘$’, Denote hex with ‘h’ at end 100h $0x100 • Operand size indicated by operands rather than operator suffix sub subl • Addressing format shows effective address computation [eax*4+100h] $0x100(,%eax,4) lea eax,[ecx+ecx*2] sub esp,8 cmp dword ptr [ebp-8],0 mov eax,dword ptr [eax*4+100h] leal (%ecx,%ecx,2),%eax subl $8,%esp cmpl $0,-8(%ebp) movl $0x100(,%eax,4),%eax