430 likes | 446 Views
Machine-Level Representation of Programs I. Outline. Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory and Registers Addressing Mode Data Formats Suggested reading Chap 1.2, 1.4.1, 1.7.3, 3.1, 3.2, 3.3, 3.4.1. The Hello Program.
E N D
Outline • Compiler drivers • History of the Intel IA-32 architecture • Assembly code and object code • Memory and Registers • Addressing Mode • Data Formats • Suggested reading • Chap 1.2, 1.4.1, 1.7.3, 3.1, 3.2, 3.3, 3.4.1
The Hello Program • It begins life as a high-level C program • Can be read and understand by human beings • The individual C statements must be translated by compiler drivers • So that the hello program can run on a computer system • Compiler:编译器
The Hello Program • The C programs are translated into • A sequence of low-level machine-language instructions • These instructions are then packaged in a form • called an object program • Object program are stored as a binary disk file • Also referred to as executable object files
Preprocessor (cpp) hello.i Modified source program (text) Compiler (cc1) hello.s Assembly program (text) Assembler (as) hello.o Relocatable object program (binary) Linker (ld) hello Executable object program (binary) The Context of a Compiler (gcc) Figure 1.3 P5 hello.c Source program (text) Compiler:编译器 Assembler:汇编器 Linker:连接器
Characteristics of the high level programming languages • Abstraction • Productive • reliable • Type checking • As efficient as hand written code • Can be compiled and executed on a number of different machines, whereas assembly code is highly machine specific Productive:多产的 Reliable: 可靠的
Characteristics of the assembly programming languages • Managing memory • Low level instructions to carry out the computation • Highly machine specific
Why should we understand the assembly code • Understand the optimization capabilities of the compiler • Analyze the underlying inefficiencies in the code • Sometimes the run-time behavior of a program is needed
From writing assembly code to understand assembly code • Different set of skills • Transformations • Relation between source code and assembly code • Reverse engineering • Trying to understand the process by which a system was created • By studying the system and • By working backward Backward:回溯
A Historical Perspective • Long evolutionary development • Started from rather primitive 16-bit processors • Added more features • Take the advantage of the technology improvements • Satisfy the demands for higher performance and for supporting more advanced operating systems • Laden with features providing backward compatibility that are obsolete * laden with:承载 * compatibility: 兼容性 * obsolete:陈旧的
X86 family • 8086(1978, 29K) • The heart of the IBM PC & DOS • 1M bytes addressable, 640K for users • 80286(1982, 134K) • More (now obsolete) addressing modes • Basis of the IBM PC-AT & Windows
X86 family • i386(1985, 275K) • 32 bits architecture, flat addressing model • Support a Unix operating system • I486(1989, 1.9M) • Integrated the floating-point unit onto the processor chip
X86 family • Pentium(1993, 3.1M) • PentiumPro(1995, 6.5M) • P6 microarchitecture • Conditional mov • Pentium/MMX(1997, 4.5M) • New class of instructions for manipulating vectors of integers
X86 family • Pentium II(1997, 7M) • Implementing MMX instructions within P6 • Pentium III(1999, 8.2M) • New class of instructions for manipulating vectors of floating-point numbers(SSE, Stream SIMD Extension)
X86 family • Pentium 4(2001, 42M) • Netburst microarchitecture • 144 new SSE2 instructions
X86 family • Advanced Micro Devices (AMD) • Now are close competitors to Intel • Developing own extension to 64-bits
X86 family • Transmeta • In January of 2002, introduced CrucoeTM processor • Radically different approach to implementation • Translates x86 code into “Very Long Instruction Word” (VLIW) code • High degree of parallelism • Shooting for low-power market such as lap-top computers
Hardware Organization Figure 1.4 P7 • CPU: Central Processing Unit • ALU: Arithmetic/Logic Unit • PC: Program Counter • USB: Universal Serial Bus
Virtual spaces • A linear array of bytes • each with its own unique address (array index) starting at zero 0xffffffff 0xfffffffe 0x2 0x1 0x0 contents addresses
Data layout • Object model in C • Different data types can be declared
Data layout • Object model in assembly • A large, byte-addressable array • No distinctions even between signed or unsigned integers • Code, user data, OS data • Run-time stack for managing procedure call and return • Blocks of memory allocated by user
Operations in C constructs • Arithmetic expression evaluation • Loops • Procedure calls and returns • Translated into sequences of instructions
Operations in Assembly Instructions • Performs only a very elementary operation • Normally one by one in sequential • Operate data stored in registers • Transfer data between memory and a register • Conditionally branch to a new instruction address
FF C0 %eax %ah %al Addresses BF Stack %edx %dh %dl %ecx %ch %cl Data %ebx %bh %bl 80 Heap 7F %esi %edi Instructions %esp 40 DLLs %ebp 3F Heap %eip Data %eflag 08 Text 00 Assembly Programmer’s View Figure 3.2P136
Programmer-Visible States P129 • Program Counter(%eip) • Address of the next instruction • Register File • Heavily used program data • Integer and floating-point
Programmer-Visible States • Conditional code register • Hold status information about the most recently executed instruction • Implement conditional changes in the control flow
C Code • Add two signed integers • int t = x+y;
Assembly Code • Operands: • x: Register %eax • y: Memory M[%ebp+8] • t: Register %eax • Instruction • addl 8(%ebp),%eax • Add 2 4-byte integers • Similar to expression x +=y • Return function value in %eax
Object Code • 3-byte instruction • Stored at address 0x80483b7 • 0x80483b7: 03 45 08
variable constant Operands P137 • In high level languages • Either constants (常数) • Or variable (变量) • Example • A = A + 4
memory register immediate Operands • Counterparts in assembly languages • Immediate ( constant ) • Register ( variable ) • Memory ( variable ) • Example movl 8(%ebp),%eax addl $4, %eax
Simple Addressing Mode • Immediate • represents a constant • The format is $imm ($4, $0xffffffff) • Registers • The fastest storage units in computer systems • Typically 32-bit long • Register mode Ea • The value stored in the register • Noted as R[Ea]
Virtual spaces • A linear array of bytes • each with its own unique address (array index) starting at zero 0xffffffff 0xfffffffe 0x2 0x1 0x0 contents addresses
Memory References • The name of the array is annotated as M • If addr is a memory address • M[addr] is the content of the memory starting at addr • addris used as an array index • How many bytes are there in M[addr]? • It depends on the context
Memory Addressing Mode • An expression for • a memory address (or an array index) • Most general form • imm (Eb, Ei, s) • s: 1, 2, 4, 8 • The address represented by the above form • imm + R[Eb] + R[Ei] * s • It gives the value • M[imm + R[Eb] + R[Ei] * s]
Practice problem 3.1 P138 Operand Value Comment %eax 0x100 Register (%eax) 0xFF Address 0x100 Immediate $0x108 0x108 0x108 0x13 Absolute address 260(%ecx,%edx) 0x13 Address 0x108 (%eax,%edx,4) 0x11 Address 0x10C
Data Formats • Move data instruction • mov (general) • movb (move byte) • movw (move word) • movl (move double word)