Real-World Instruction Set Architectures Focus on IA-32

Real-World Instruction Set ArchitecturesFocus on IA-32 http://www.pds.ewi.tudelft.nl/~iosup/Courses/2012_ti1400_5.ppt Course website: http://www.pds.ewi.tudelft.nl/~iosup/Courses/2012_ti1400_results.htm

IA family 1982 • IA (Intel Architecture) is a family of processors • Each processor—same architecture, but different organization • same instruction set • different performance • 32-bit memory addresses and variable length instructions • Very large instruction set (not RISC) 1985 1989 1993

Floorplan IA-32

Other Example: PowerPC Instruction unit instructions instructions Floating-point unit Integer unit Cache main memory

Floorplan PowerPC

Floorplan PowerPC Registers Load/Store Unit Data Cache MMU Instr. Cache FPU

IA-32 Introduction Memory Layout Registers Instructions Examples of Assembler Code for IA-32 Subroutines 7

Memory • Memory is byte addressable • Doublewords can start at any byte location • Data Operands are 8 or 32 bits wide • Mode is little-endian scheme (vs big-endian PowerPC)

Addressable data units Bit 31 0 byte 3 byte 0 Byte Doubleword 0

IA register structure FP0 floating - point registers FP7 R0 general- purpose registers R7

Register Naming AX AH AL R0 EAX R1 EBX R2 ECX R3 EDX R4 ESP R5 EBP R6 ESI R7 EDI EIP EFLAGS Data registers Pointer registers Index registers Instruction Pointer Status Register

Status Register Status Register 0 31 13 12 11 9 8 7 6 OF IF TF SF ZF CF IOPL • CF Carry • ZF Zero • SF Sign • IOPL I/O privilege level • OF Overflow • IF Interrupt enable

Special registers Code Segment CS Stack Segment SS DS ES FS GS Data Segments

Instructions • Variable length instructions 1-12 bytes • Five typeof instructions • Copyinstructions (MOV) • Arithmetic and logic instructions • Flow control • Processor control instructions • I/O instructions • Format: INSTR Rdst,Rsrc

Instruction Format Opcode Addressing Displacement Immediate 1 or 2 bytes 1 or 2 bytes 1 or 4 bytes 1 or 4 bytes variable opcode length

Q CISC or RISC? Q Why both 5 and 6? Addressing modes • Many addressing modes: • Immediate value • Direct M(value) • Register [reg] • Register Indirect M([reg]) • Base with displacement M([reg]) +Disp • Index with displacement M([reg]S +Disp) • Base with index M([reg1]+[reg2]S) • Base with index and M([reg1]+[reg2]S+Disp) displacement S=1,2,4 or 8 Disp= 8 or 32-bit signed number

Immediate and Direct • Immediate MOV EAX, 25 [EAX]  #25 MOV EAX, 3FA00H [EAX]  # 3FA00H • Direct MOV EAX, loc [EAX]  M(loc) or MOV EAX, [loc][EAX]  M(loc)

Register indirect • Register MOV EBX,OFFSET loc [EBX]  #loc or LEA EBX,loc[EBX]  #loc • Register indirect MOV EAX,[EBX] [EAX]  M(EBX) and MOV [EBX], 10 [EBX]  10 MOV DWORD PTR [EBX], 10 [EBX]  10 Q Why DWORD PTR?

Base with Index and Displacement • MOV EAX,[EBP+ESI*4+200] EAX  M([EBP] + [ESI]*4 + #200) EBP 1000 1000 40 ESI 1200 1360 Operand

Arithmetic instructions • May have one or two operands ADD dst,scr meaning [dst] [dst] + [src]

Compare • Used to compare values and leave register contents unchanged CMP dst, src [dst] - [src]

Flow control • Two basic branch instructions: • JMP [loc] Branch unconditionally • JG, JZ, JS, etc Branch if condition is satisfied

Summation exampleJava code int[] listarray = new list[n]; int sum=0; for(index=n-1, index>=0, index--){ sum += list[index]; }

Summation exampleAssembler code, Version 1 [1/4] LEA EBX, NUM1 [EBX]  #NUM1 MOV ECX, N [EXC]  M(N) MOV EAX, 0 [EAX]  #0 MOV EDI, 0 [EDI]  #0 L: ADD EAX, [EBX+EDI*4] Add next number to EAX INC EDI [EDI]  [EDI] +1 DEC ECX [ECX]  [ECX] -1 JG L Branch if [ECX]>0 MOV SUM, EAX M(SUM)  [EAX]

Summation exampleAssembler code, Version 1 [4/4] LEA EBX, NUM1 [EBX]  #NUM1 MOV ECX, N [EXC]  M(N) MOV EAX, 0 [EAX]  #0 MOV EDI, 0 [EDI]  #0 L:ADD EAX, [EBX+EDI*4] Add next number to EAX INC EDI [EDI]  [EDI] +1 DEC ECX [ECX]  [ECX] -1 JG L Branch if [ECX]>0 MOV SUM, EAX M(SUM)  [EAX]

Summation exampleAssembler code, Version 1 LEA EBX, NUM1 [EBX]  #NUM1 MOV ECX, N [EXC]  M(N) MOV EAX, 0 [EAX]  #0 MOV EDI, 0 [EDI]  #0 L: ADD EAX, [EBX+EDI*4] Add next number to EAX INC EDI [EDI]  [EDI] +1 DEC ECX [ECX]  [ECX] -1 JG L Branch if [ECX]>0 MOV SUM, EAX M(SUM)  [EAX]

Summation exampleAssembler code, Version 2 LEA EBX, NUM1 [EBX]  #NUM1 SUB EBX, 4 MOV ECX, N [EXC]  M(N) MOV EAX, 0 [EAX]  #0 L: ADD EAX, [EBX+ECX*4]Add next number to EAX LOOP L[ECX]  [ECX] -1 Branch if [ECX]>0 MOV SUM, EAX M(SUM)  [EAX] Q Why SUB EBX,4?

Summation examplePerformance, Version 1 vs Version 2 LEA EBX, NUM1 MOV ECX, N MOV EAX, 0 MOV EDI, 0 L: ADD EAX, [EBX+EDI*4] INC EDI DEC ECX JG L MOV SUM, EAX LEA EBX, NUM1 SUB EBX, 4 MOV ECX, N MOV EAX, 0 L: ADD EAX, [EBX+ECX*4] LOOP L MOV SUM, EAX Replaced 1xMOV with 1xSUB Replaced 1xINC+1xDEC+1xJG with 1xLOOP Q What is the performance loss/gain?

Summation exampleThe .asm File .data NUM1 DD 0, 1, 2, -1, -2 N DD 5 SUM DD 0 .code MAIN:LEA EBX, NUM1 SUB EBX, 4 MOV ECX, N MOV EAX, 0 L: ADD EAX, [EBX+ECX*4] LOOP L MOV SUM, EAX CMP SUM,0 END MAIN

Sorting exampleJava code int[] listarray = new list[n]; int temp; for(j=n-1, j>0, j--){ for(k=j-1, k>=0, k--){ if(list[j] > list[k]) { temp = list[k]; list[k] = list[j]; list[j] = temp; } } }

Sorting ExampleAssembler code [1/4] LEA EAX, list [EAX]  #list MOV EDI, N [EDI]  n DEC EDI [EDI]  n-1 init(j) outer: MOV ECX, EDI [ECX] j DEC ECX [ECX] j-1 init (k) MOV DL, [EAX+EDI] load list(j) into DL inner: CMP [EAX+ECX], DL compare list(k) to list(j) JLE next if list(j) >= list(k) XCNG [EAX+ECX], DL swap list(j), list(k) MOV [EAX+ECX], DL new list(j) in DL next: DEC ECX decrement k JGE inner repeat or terminate DEC EDI decrement j JGE outer repeat or terminate

Sorting ExampleAssembler code [2/4] LEA EAX, list [EAX]  #list MOV EDI, N [EDI]  n DEC EDI [EDI]  n-1 init(j) outer: MOV ECX, EDI [ECX] j DEC ECX [ECX] j-1 init (k) MOV DL, [EAX+EDI] load list(j) into DL inner: CMP [EAX+ECX], DL compare list(k) to list(j) JLE next if list(j) >= list(k) XCNG [EAX+ECX], DL swap list(j), list(k) MOV [EAX+ECX], DL new list(j) in DL next: DEC ECX decrement k JGE inner repeat or terminate DEC EDI decrement j JGE outer repeat or terminate

Sorting ExampleAssembler code [3/4] LEA EAX, list [EAX]  #list MOV EDI, N [EDI]  n DEC EDI [EDI]  n-1 init(j) outer: MOV ECX, EDI [ECX] j DEC ECX [ECX] j-1 init (k) MOV DL, [EAX+EDI] load list(j) into DL inner:CMP [EAX+ECX], DL compare list(k) to list(j) JLE next if list(j) >= list(k) XCNG [EAX+ECX], DL swap list(j), list(k) MOV [EAX+ECX], DL new list(j) in DL next: DEC ECX decrement k JGE inner repeat or terminate DEC EDI decrement j JGE outer repeat or terminate

Sorting ExampleAssembler code [4/4] LEA EAX, list [EAX]  #list MOV EDI, N [EDI]  n DEC EDI [EDI]  n-1 init(j) outer:MOV ECX, EDI [ECX] j DEC ECX [ECX] j-1 init (k) MOV DL, [EAX+EDI] load list(j) into DL inner: CMP [EAX+ECX], DL compare list(k) to list(j) JLE next if list(j) >= list(k) XCNG [EAX+ECX], DL swap list(j), list(k) MOV [EAX+ECX], DL new list(j) in DL next: DEC ECX decrement k JGE inner repeat or terminate DEC EDI decrement j JGE outer repeat or terminate

Q Is this code a correctimplementation of the Java code? Sorting ExampleAssembler code [4/4] LEA EAX, list [EAX]  #list MOV EDI, N [EDI]  n DEC EDI [EDI]  n-1 init(j) outer:MOV ECX, EDI [ECX] j DEC ECX [ECX] j-1 init (k) MOV DL, [EAX+EDI] load list(j) into DL inner: CMP [EAX+ECX], DL compare list(k) to list(j) JLE next if list(j) >= list(k) XCNG [EAX+ECX], DL swap list(j), list(k) MOV [EAX+ECX], DL new list(j) in DL next: DEC ECX decrement k JGE inner repeat or terminate DEC EDI decrement j JGE outer repeat or terminate int[] listarray = new list[n]; int temp; for(j=n-1, j>0, j--){ for(k=j-1, k>=0, k--){ if(list[j] > list[k]) { temp = list[k]; list[k] = list[j]; list[j] = temp; } } }

IA-32 Introduction Registers Memory Layout Instructions Examples of Assembler Code for IA-32 Subroutinesreally long 41

Subroutines [EIP] #sub [EIP] [ESP] [ESP] [ESP]+4 • CALL sub • Return address is saved on stack (ESP register) • Return is RET

Stack instructions • ESP register is used as stack pointer • PUSH src [ESP]  [ESP] - #4 M([ESP])  [src] • POP dst [dst]  M([ESP]) [ESP]  [ESP] + #4 • PUSHAD (POPAD) push (pop) all 8registers on (from) stack

Stack frames [1/4] Note: Sub1 starts at address 2400 .... PUSH N Parameter n on stack 2000 CALL Sub1 Call subroutine at 2400 ........... Stack 10056 ESP Stack Pointer stack pointer program counter 2400 EIP 2004 10052 N

Stack frames [2/4] Note: Sub1 starts at address 2400 .... PUSHNParameter Non stack 2000 CALL Sub1 Call subroutine at 2400 ........... Stack 10052 ESP Stack Pointer stack pointer program counter 2000 EIP 10052 N

Stack frames [3/4] Note: Sub1 starts at address 2400 .... PUSH N Parameter n on stack 2000 CALLSub1Callsubroutineat2400 ........... Stack 10048 ESP Stack Pointer stack pointer program counter 10048 2000 EIP 2004 10052 N

Stack frames [4/4] Note: Sub1 starts at address 2400 .... PUSH N Parameter n on stack 2000 CALLSub1Callsubroutineat2400 ........... Stack 10048 ESP Stack Pointer stack pointer program counter 10048 2400 EIP 2004 10052 N

Subroutine Sub1 Sub1: PUSH EAXSave EAX PUSH EBXSave EBX MOV EAX, [EDI + 12] n to EAX DEC EAX .... PUSH EAX Load n-1 on stack L: CALLSub2Callsubroutine POP N Put result in M(N) POP EBXRestore EBX POP EAXRestore EAX RET return

2400: PUSH EAX PUSH EBX MOV EAX, [EDI + 12] DEC EAX Stack frame in Sub1 AfterPUSH EBX Stack frame at arrow 10036 ESP 10040 [EBX] 10040 [EAX] Return Address 10052 N ? EIP Q What is the value op EIP?

Subroutine Sub1 2400 PUSH EAX Save EAX PUSH EBX Save EBX MOV EAX, [EDI + 12] n to EAX DEC EAX .... PUSH EAX Load n-1 on stack L: CALL Sub2 Call subroutine POP N Put result in M(N) POP EBX Restore EBX POP EAX Restore EAX RET return AfterDEC EAX

Real-World Instruction Set Architectures Focus on IA-32

Real-World Instruction Set Architectures Focus on IA-32

Presentation Transcript

Instruction Set Architectures

Instruction Set Architectures Part 2

CS1104 Help Session I Instruction Set Architectures

Language for Instruction Set Architectures

INSTRUCTION SET ARCHITECTURES

Instruction Set Architectures: History and Issues

Instruction Set Architectures

IA-32 Architecture

IA-32 Architecture

Some Other Instruction Set Architectures

Instruction Set Architectures

IA-32

INSTRUCTION SET ARCHITECTURES

Instruction Set Architectures

Chapter 3 Instruction Set Architectures

Chapter 10- Instruction set architectures

Real instruction set architectures

Instruction Set Architectures

Instruction set architectures

Instruction Set Architectures Part 1

Focus on Instruction

Chapter 3 Instruction Set Architectures