330 likes | 638 Views
Introduction to Assembly Language IA32. Summer 2014 COMP 2130 Intro Computer Systems Computing Science Thompson Rivers University. Course Objectives. The better knowledge of computer systems, the better programing. Course Contents. Introduction to computer systems: B&O 1
E N D
Introduction to Assembly Language IA32 Summer 2014 COMP 2130 Intro Computer Systems Computing Science Thompson Rivers University
Course Objectives • The better knowledge of computer systems, the better programing. IA32
Course Contents • Introduction to computer systems: B&O 1 • Introduction to C programming: K&R 1 – 4 • Data representations: B&O 2.1 – 2.4 • C: advanced topics: K&R 5.1 – 5.10, 6 – 7 • Introduction to IA32 (Intel Architecture 32): B&O 3.1 – 3.8, 3.13 • Compiling, linking, loading, and executing: B&O 7 (except 7.12) • Dynamic memory management – Heap: B&O 9.9.1 – 9.9.2, 9.9.4 – 9.9.5, 9.11 • Code optimization: B&O 5.1 – 5.6, 5.13 • Memory hierarchy, locality, caching: B&O 5.12, 6.1 – 6.3, 6.4.1 – 6.4.2, 6.5, 6.6.2 – 6.6.3, 6.7 • Virtual memory (if time permits): B&O 9.4 – 9.5 IA32
Unit Learning Objectives • Introducing the Assembly Language • Translate a simple C code into an assembly code. • Understand how a computer system runs an application. • List registers used in IA32. • Compute memory address. • In function call, understand the use of user stack for variables, with %esp and %ebp • List two data types. • List three operations types. • List three operand types. • Use different memory addressing modes. IA32
Introduction to Assembly Language • An Intermediate Language between Machine code and the High Level Language • It is towards the Low Level Language Paradigm as it follow the norm of “One Language Instruction For One Machine Instruction” • It has many advantages over: • Machine code • Better human understanding • Easy to write and debug • Use of mnemonics for instructions • Reserves Memory location for data • High Level Language • It writes more effective programs IA32
Disadvantages • It also has some disadvantages like: • Must have the specific assembler for the specific machines • Source code needs to be written in a well defined formats. Introduction
Assembly Language Format • The standard form of each instruction has the following elements: • Label : it is symbolic name of the address, which you may refer as a part of the jump. It can be for variable number of chars. It is an optional element • Operation or mnemonic : which actually is used for some execution • These also have pseudo opcodes – which are the directives for the assembler • ORG - to specify the origin of the data area • ADDR – to define the address • DB – to define data • Operands : the operands are the ones on which operation takes place. More than one operand must be separated by comma • Comment : at the end of the statement and starts with ; Introduction
Why should we spend our time learning machine code? • Even though compilers do most of the work in generating assembly code, being able to read and understand it is an important skill for serious programmers. • By reading assembly code, • We can understand the optimization capabilities of the compiler and analyze the underlying inefficiencies in the code. • We can understand the function invocation mechanism. • We can help ourselves understand how computer systems (HW) and operating systems (SW) run programs. • IA32 • Traditional x86 • 32bit • Linux uses what is referred to as flat addressing, where the entire memory space is viewed by the programmer as a large array of bytes. IA32
Programmer-Visible State PC: Program counter Address of next instruction in memory Called “EIP” (IA32) or “RIP” (x86-64) Registerfile Heavily used program data Condition codes Store status information about most recent arithmetic operation Used for conditional branching, e.g., if Memory Byte addressable array Code, user data, (some) OS data Includes stack used to support procedures Carnegie Mellon Assembly Programmer’s View Memory CPU Addresses Registers Object Code Program Data OS Data PC Data Condition Codes Instructions Stack
Carnegie Mellon Compiling Into Assembly Generated IA32 Assembly C Code sum: pushl %ebp movl %esp,%ebp movl 12(%ebp),%eax addl 8(%ebp),%eax popl %ebp ret int sum(int x, int y) { int t = x+y; return t; } pushl, movl, addl, popl, ret are mnemonics for machine instructions. Some compilers use instruction “leave” • Obtain with command • $ gcc –O1 -S code.c • Produces file code.s
Carnegie Mellon Assembly Characteristics: Data Types • “Integer” data of 1, 2, or 4 bytes • Data values • Addresses • Floating point data of 4, 8, or 10 bytes • No aggregate types such as arrays or structures • Just contiguously allocated bytes in memory
Instruction (mnemonics) • There are almost 20-30 different instructions used in Assembly • 14 are the most commonly used Introduction
Carnegie Mellon Assembly Characteristics: Operation Types • Perform arithmetic function on register or memory data • Transfer data between memory and register • Load data from memory into register • Store register data into memory • Transfer control • Unconditional jumps to/from procedures • Conditional branches
Carnegie Mellon Object Code Code for sum • Assembler • Translates .s into .o • Binary encoding of each instruction • Nearly-complete image of executable code • Missing linkages between code in different files • Linker • Resolves references between files • Combines with static run-time libraries • E.g., code for malloc, printf • Some libraries are dynamically linked • Linking occurs when program begins execution 0x401040 <sum>: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45 0x08 0x5d 0xc3 • Total of 11 bytes • Each instruction 1, 2, or 3 bytes • Starts at address 0x401040
Registers • These are small memory area which is volatile and is used for all memory manipulation • There are 8 “general purpose” registers + • There is 1 “instruction pointer” that points to the next instruction to execute • Out of 8 – 6 are the commonly used registers where as the other two are rarely used Introduction
Registers • EAX – used to store the value returned from a function or as an accumulator to add the values • EBX – base pointer to the data section • ECX – counter register for loops and strings • EDX – I/O Pointer • ESI – Source Indicator • EDI – Destination Indicator • ESP - stack Pointer • EBP – Stack Frame base pointer (where the stack starts for a specific function) Instruction Pointer • EIP – Pointer to the next instruction to execute Introduction
32 bit to 64 bit assembly • All registers can be accessed in 16-bit and 32-bit modes. In 16-bit mode, the register is identified by its two-letter abbreviation from the list above i.e. AX. In 32-bit mode, this two-letter abbreviation is prefixed with an 'E' (extended). For example, 'EAX' is the accumulator register as a 32-bit value. • In the 64-bit version, the 'E' is replaced with an 'R', so the 64-bit version of 'EAX' is called 'RAX'. Introduction
Types of data storage • Caller Save Registers – EAX, ECX & EDX • These registers are the responsibility of the caller function to manage the data stored in it. • Callee Save Registers – EBP, EBX, ESI & EDI • It is the responsibility of the called function to store the values of these registers before using them so that they may retrieve the same values before they leave the function Introduction
Register size • As per the architecture, register sizes are EAX – Extended AX for 32 bit archtecture Introduction
EFLAGS register • A special register that holds many single bit flags. • ZERO FLAG (ZF) – sets if the result of the instruction is zero; cleared otherwise • SIGN FLAG (SF) – sets equal to the most significant bit of the result Introduction
General Command Set Introduction
Instructions • PUSH – push word, double-word or Quad-word on the stack • It automatically decrements the stack pointer esp, by 4 • POP – pops the data from the stack • Sets the esp automatically • It would increment esp • EQU – sets a variable equal to some memory • HLT – to halt the program Introduction
Assembler Types • There are three main types of assembler : • MASM - the Microsoft Assembler. It outputs OMF files (but Microsoft's linker can convert them to win32 format). • GAS - the GNU assember. This uses the rather ugly AT&T-style syntax so many people do not like it; however, you can configure it to use and understand the Intel-style. It was designed to be part of the back end of the GNU compiler collection (gcc). this is run as as, and this is the one followed in the book • NASM - the "Netwide Assembler." It is free, small, and best of all it can output zillions of different types of object files. The language is much more sensible than MASM in many respects. Introduction
Assembly file using GAS (ATT) .globl _start .text _start: movl $len, %edx movl $msg, %ecx movl $1, %ebx movl $4, %eax int $0x80 movl $0, %ebx movl $1, %eax int $0x80 .data msg: .ascii "Hello, world!\n" len = . - msg Introduction
Compile and link Introduction
The AT&T vs Intel Architecture • So we know the differences in the ISA architecture: Introduction
%eax %ecx %edx %ebx %esi %edi %esp %ebp Carnegie Mellon Moving Data: IA32 • Moving Data (data transfer operations) movlSource, Dest: • Operand Types • Immediate: Constant integer data • Example: $0x400, %eax • Like C constant, but prefixed with ‘$’ • Encoded with 1, 2, or 4 bytes • Register: One of 8 integer registers • Example: %eax, %edx • But %espand %ebpreserved for special use • Others have special uses for particular instructions • Memory:4 consecutive bytes of memory at address given by register • Simplest example: (%eax) • Various other “address modes”
Carnegie Mellon movl Operand Combinations Assumption: %eax <- temp1; %edx <- temp2; %ecx <- p; Cannot do memory-memory transfer with a single instruction How to do memory-memory transfer? Source Dest Src, Dest C Analog Reg movl $0x4,%eax temp1 = 0x4; Imm Mem movl $-147,(%ecx) *p = -147; Reg movl %eax,%edx temp2 = temp1; movl Reg Mem movl %eax,(%ecx) *p = temp1; Mem Reg movl (%ecx),%eax temp1 = *p;
Carnegie Mellon Simple Memory Addressing Modes • Normal (R) Mem[Reg[R]] • Register R specifies memory address.movl (%ecx),%eax • Displacement D(R) Mem[Reg[R]+D] • Register R specifies start of memory region. • Constant displacement D specifies offset.movl 8(%ebp),%edx i.e. %edx= %edx + (%ebp+4);
Memory Operands • Addressing Memory • 8 bit is the smallest unit • 32 bit addresses (may be extended to 64 bits for 64 bit assembly) • IA32 is little endian • Examples • movb $0x4a, %al // stores 0x4a in one byte • movw $5, %ax // stores 5 in two bytes • Movl $7, %eax // stores 7 in four bytes Introduction
Examples • Given below is the information • Fill in the following table showing the values %eax 0x100 0x104 0xAB $0x108 0x108 (%eax) 0xFF 4(%eax) 0xAB 9(%eax, %edx) -> data at location %eax + %edx + 9 -> 0x11 260(%ecx,%edx) -> data at location %ecx + %edx + 260 -> 0x13 (%eax,%edx,4) -> data at %eax + %edx * 4 -> 0x11 Introduction