320 likes | 508 Views
x86 Programming Memory Accessing Modes, Characters, and Strings. Computer Architecture. Multi byte storage. Multi-byte data types include: word/short (2 bytes) int (4 bytes) long or quad (8 bytes) Conceptual representation Most significant byte (MSB) is left most byte
E N D
x86 ProgrammingMemory Accessing Modes,Characters, and Strings Computer Architecture
Multi byte storage • Multi-byte data types include: • word/short (2 bytes) • int (4 bytes) • long or quad (8 bytes) • Conceptual representation • Most significant byte (MSB) is left most byte • Least significant byte (LSB) is right most byte • Example: • Number: 0xaabb • MSB: 0xaa • LSB: 0xbb • In memory representation (applicable only to multi byte storage) • Big Endian • MSB is stored at the lower memory address • Little Endian • MSB is stored at the higher memory address
Big vs. Little Endian • Consider the integer: 0x11aa22bb • Big Endian Storage • Little Endian Storage (x86 architecture) Memory Address Memory Address
Characters • Characters are simply represented using an unsigned 8-bit (byte) numbers • In memory as well as in instructions. • The number is interpreted and displayed as characters for Input-Output (I/O) purposes only! • The mapping from byte values to character (as displayed on screen) is based on the American Standard Code for Information Interchange (ASCII) • It is used all over the world by all I/O devices • Like: Monitors, keyboards, etc.
Standard ASCII Codes • Here is a short table illustrating standard ASCII codes that are frequently used:
Characters in assembly • Example assembly code with 5 characters • Note that the characters stored at consecutive memory addresses! It is guaranteed by the assembler! /* Assembly program involving characters */ .text /* Instructions */ .data char1: .byte 72/* ASCII code for ‘H’ */ char2: .byte 101/* ASCII code for ‘e’ */ char3: .byte 108/* ASCII code for ‘l’ */ char4: .byte 108/* ASCII code for ‘l’ */ char5: .byte 111/* ASCII code for ‘o’ */
For the Java programmer… • Assembler permits direct representation of characters • It converts characters to ASCII codes /* Assembly program involving characters */ .text /* Instructions */ .data char1: .byte ’H’/* Assembler converts the */ char2: .byte ’e’/* characters to ASCII */ char3: .byte ’l’ char4: .byte ’l’ char5: .byte ’o’
Memory organization 0x20 0x21 0x22 0x23 0x24 H e l l o • Bytes declared consecutively in the assembly source are stored at consecutive memory locations • Assume that the assembler places char1 (‘H’) at address 0x20, then other characters have the following memory addresses: Addresses
Working with characters • All characters (including other symbols) have 2 unique values associated with them • The address in memory • Accessed by prefixing the symbol with a $ (dollar) sign • The memory address is always 32-bits (4 bytes) on 32-bit x86 processors • It is 64-bits wide on 64-bit x86 processors. • The value contained in the memory location • Accessed without any prefixes to the symbol. • The bytes read depends on the type of the symbol • 1 byte for byte, 4 bytes for int etc. • This is exactly how we have been doing it so far.
Cross Check 0x20 0x21 0x22 0x23 0x24 H e l l o • Given the following memory layout and symbol table what are the values of: • $letter: 0x20 • Yellow: ‘e’ • $k: 0x22 • e: ‘o’ Addresses of symbols (expressions with a $ sign) are obtained from the symbol table while values of symbols (expressions without $ sign) are obtained from the memory layout shown below. Symbol Address letter 0x20 0x21 Yellow k 0x22 e 0x24 Address
Example assembly /* Example use of characters */ .text movb char1, %al /* al = ASCII(‘H’) */ addb $1, %al /* al = ASCII(‘I’) */ movb %al, char1/* char1 = (‘I’) */ movl $char1, %ebx /* ebx = addressOf(char1) */ .data char1: .byte ‘H’
What’s the use of addresses? • Why bother loading addresses into registers? • x86 permits indirect memory access and manipulation using addresses stored in registers! • A variety of mechanisms are supported by x86 processors for generating the final memory address for retrieving data • The variety of mechanism is collectively called memory Addressing Modes
Addressing Modes • x86 supports the following addressing modes • Register mode • Immediate mode • Direct mode • Register direct mode • Base displacement mode • Base-index scaled mode
Register mode • Instructions involving only registers • This is the simplest and fastest mechanism • Data is loaded and stored to registers. • In this mode, the processor does not access RAM. .text movb %al, %ah /* ah = al */ addl %eax, %ebx /* ebx += eax */ mull %ebx /* eax *= ebx */
Immediate mode • Instructions involving registers & constants • This mode is used to load constant values into registers • The constant value to be loaded is encoded as a part of the instruction. • Consequently, there is no real memory access .text movb $5, %ah /* ah = 5 */ addl $-35, %ebx /* ebx += -35 */
Direct Mode • Standard mode used with symbols • Address to load/store data is part of instruction • Involves 1 memory access using the address • Number of bytes loaded depends on type • Symbols are used to represent addresses • Source/Destination has to be a register! .text movb char1, %ah /* ah = ‘H’ */ addl %eax, i1 /* i1 += eax */ .data char1: .byte ‘H’ i1: .int 100
Register direct mode • Address for memory references are obtained from a register. • The address needs to be loaded into a register. • Addresses can be manipulated as a regular number! .text /* eax = addressOf(char1) */ movl $char1, %eax movb (%eax), %bl /* bl = ‘H’ */ inc %eax /* eax++ */ movb %bl, (%eax) /* char2 = char1 */ .data char1: .byte ‘H’ char2: .byte ‘e’
Register direct mode (Contd.) • Register direct mode is most frequently used! • It is analogous to accessing using references in Java • Note that one of the operands in register direct mode has to be a register • Pay attention to the following syntax • $symbol: To obtain address of symbol • Address is always 32-bits! • (%register): Data stored at the memory address contained in register. • The number of bytes read from the given memory location depends on the instruction.
Base Displacement Mode • Constant offset from a given address stored in a register • Used to access parameters to a method • We will see the use for this mode in the near future. .text /* eax = addressOf(char1) */ movl $char1, %eax movb 1(%eax), %bl /* bl = char2 */ inc %eax movb %bl, -1(%eax) /* char1 = char2 */ .data char1: .byte ‘H’ char2: .byte ‘e’ Displacement value is constant. The base value is contained in registers!
Base-Index scaled Mode • Most complex form of memory referencing • Involves a displacement constant • A base register • An index register • A scale factor (must be 0, 1, 2, 4, or 8) • Final address for accessing memory is computed as: address = base_register + (index_register * scale_factor) + displacement_constant
Base-Index scaled Mode • Examples of this complex mode is shown below: .text /* eax = addressOf(char1) */ movl $char1, %eax movl $0, %ebx movb 1(%eax, %ebx, 4), %bl /*bl=char2*/ inc %eax movl $1, %ebx movb %bl, -1(%eax, %ebx, 0) .data char1: .byte ‘H’ char2: .byte ‘e’ Address = %eax + (%ebx * 4) + 1 = %eax + (0 * 4) + 1 = %eax + 1 Address = %eax + (%ebx * 0) - 1 = %eax + (1 * 0) - 1 = %eax - 1
LEA Instruction • X86 architecture provides a special instruction called LEA (Load Effective Address) • This instruction loads the effective address resulting from applying various memory access modes into a given register. • Examples: • LEA -1(%eax, %ebx, 0), %edi • LEA (%eax, %ebx), %edi • LEA -5(%eax), %edi
LEA Example (Contd.) • Here is an example of the LEA instruction .text /* eax = addressOf(char1) */ movl $char1, %eax movl $0, %ebx lea 1(%eax, %ebx, 2), %edi /*edi = address of char2*/ movb $’h’, (%edi) /* change ‘e’ to ‘h’*/ .data char1: .byte ‘H’ char2: .byte ‘e’
Strings • Strings are simply represented as a sequence (or array) of characters in memory • Each character is stored at a consecutive memory address! • Every string is terminated by ASCII value 0 • Represented as ‘\0’ in assembly source
Declaring Strings in Assembly • Strings are defined using the .string directive .text /* Instructions go here */ .data msg1: .string “Hello\n” msg2: .string “World!\n”
Memory representation • Given the previous example, the strings (msg1 and msg2) are stored in memory as shown below: .text /* Instructions go here */ .data msg1: .string “Hello\n” msg2: .string “World!\n” 21 22 24 25 26 20 23 H e l l o \n \0 msg1=20 2E 28 29 2B 2C 2D 27 2A W o r l d ! \n \0 msg2=27
Displaying Strings • Strings or characters can be displayed on standard output (analogous to System.out) using System call: • Set eax to 4 • To write characters to a file (stream) • Changing eax to 3 will cause reading characters instead! • Set ebx to 1 • Destination steam is standard output • You may set ebx to 2 for standard error • If ebx is 0 it indicates standard input (you can write to it!) • Set ecx to address of message to display • Set number of characters to display in edx • Call int 0x80
Complete Example /* Console output example */ text .global _start _start: mov $4, %eax /* System call to write to a file handle */ mov $1, %ebx /* File handle=1 implies standard output */ mov $msg, %ecx /* Address of message to be displayed */ mov $14, %edx /* Number of bytes to be displayed */ int $0x80 /* Call OS to display the characters. */ mov $1,%eax /* The system call for exit (sys_exit) */ mov $0,%ebx /* Exit with return code of 0 (no error) */ int $0x80 .data /* The data to be displayed */ msg: .string "Hello!\nWorld!\n" Calculated value by hand! Can be cumbersome for large strings.
Rewritten using Macro! /* Console output example */ text .global _start _start: mov $4, %eax /* System call to write to a file handle */ mov $1, %ebx /* File handle=1 implies standard output */ mov $msg, %ecx /* Address of message to be displayed */ mov $len, %edx /* Number of bytes to be displayed */ int $0x80 /* Call OS to display the characters. */ mov $1,%eax /* The system call for exit (sys_exit) */ mov $0,%ebx /* Exit with return code of 0 (no error) */ int $0x80 .data /* The data to be displayed */ msg: .string "Hello!\nWorld!\n“ .equ len, . - msg Compute a assembler constant len by subtracting address of msg from current address, represented by special symbol • (dot). Every use of $msg is replaced with the resulting constant value.
Compute string length • The previous examples use fixed length strings • For strings that change values or change lengths, the string length must be computed using suitable assembly code. • The corresponding Java source is shown below: public static int length(char[] str) { int i; for(i = 0; (str[i] != ‘\0’); i++); return i; }
Compute string length _length: /* Let eax correspond to i */ movl $0, %eax /* eax = 0 * / /* Let ebx correspond to str */ movl $str, %ebx /* ebx = address(str) */ loop: cmpb $0, (%ebx, %eax) /* str[i] != ‘\0’ */ je done /* We have hit the ‘\0’ in string */ inc %eax /* i++ */ jmp loop /* Continue the loop */ done: Base register = ebx Offset register = eax Displacement (implicit)= 0 Scale value (implicit) = 1