Understanding X86 Assembly Language Registers and Flags

Computer Organization X86Assembly Language Mohammad Sharaf

Handouts + IBM PC Assembly Language & Programming, Peter Abel, Prentice Hall, 5th edition. Chap.: 1, 4, 6, 7,8

Evolution of Microprocessor

Evolution of Microprocessor cont.

Basic Concepts

What is Registers? • You can consider it as variables inside the CPU chip They are all 16-bits

General Purpose Registers • AX, BX, CX, and DX: They can be assigned to any value you want • AX (Accumulator Register): Most of arithmetical operations are done with AX • BX (Base Register): Used to do array operations. BX is usually worked with other registers like SP to point to stacks • CX (Counter Register): Used for counter purposes • DX (Data Register). Used for storing data value

Index Registers • SI and DI: Usually used to process arrays or strings: • SI (Source Index): is always pointed to the source array • DI (Destination Index): is always pointed to the destination array

Segment Registers • CS, DS, ES, and SS: • CS (Code Segment Register): Points to the segment of the running program. We may NOT modify CS directly • DS (Data Segment Register): Points to the segment of the data used by the running program. You can point this to anywhere you want as long as it contains the desired data • ES (Extra Segment Register): Usually used with DI and doing pointers things. The couple DS:SI and ES:DI are commonly used to do string operations • SS (Stack Segment Register): Points to stack segment

Pointer Registers • BP, SP, and IP: • BP (Base Pointer): used for preserving space to use local variables • SP (Stack Pointer): used to point the current stack • IP (Instruction Pointer): denotes the current pointer of the running program. It is always coupled with CS and it is NOT Modifiable. So, the couple of CS:IP is a pointer pointing to the current instruction of running program. You can NOT access CS nor IP directly

16-bit Register • The general registers AX, BX, CX, and DX are 16-bit • However, they are composed from two smaller registers For example: AX The high 8-bit is called AH, and the low 8-bit is called AL Both AH and AL can be accessed directly • However, since they altogether embodied AX • Modifying AH is modifying the high 8-bit of AX • Modifying AL is modifying the low 8-bit of AX • AL occupy bit 0 to bit 7 of AX, AH occupy bit 8 to bit 15 of AX

Extended Register • X386 processors introduce extended registers • Most of the registers, except segment registers are enhanced into 32-bit • So, we have extended registers EAX, EBX, ECX, and so on • AX is only the low 16-bit (bit 0 to 15) of EAX • There are NO special direct access to the upper 16-bit (bit 16 to 31) in extended register

Flag Register • Flag is 16-bit register that contains CPU status • It holds the value of which the programmers may need to access. This involves detecting whether the last arithmetic holds zero result or may be overflow • Intel doesn't provide a direct access to it; rather it is accessed via stack. (via POPF and PUSHF) • You can access each flag attribute by using bitwise AND operation since each status is mostly represented by just 1 bit

Flag Register cont. • C carry flag: is turned to 1 whenever the last arithmetical operation, such as adding and subtracting, has carry or borrow otherwise 0 • P parity flag: It will set to 1 if the last operation (any operation) results even number of bit 1 • A auxiliary flag: It is set in Binary Coded Decimal (BCD) operations • Z zero flag: used to detect whether the last operation (any operation) holds zero result • S sign flag: used to detect whether the last operation holds negative result. It is set to 1 if the highest bit (bit 7 in bytes or bit 15 in words) of the last operation is 1

Flag Register cont. • T trap flag:used in debuggers to turn on the step-by-step feature • I interrupt flag: used to toggle the interrupt enable or not. If the bit is set (= 1), then the interrupts are enabled, otherwise disabled. The default is on • D direction flag: used for directions of string operations. If the bit is set, then all string operations are done backward. Otherwise, forward. The default is forward (0) • O the overflow flag: used to detect whether the last arithmetic operation result has overflowed or not. If the bit is set, then it has been an overflow

Memory • X86 CPU only has 16-bit registers, so the maximum amount of memory that can be addressed is: 216 = 65536 (64K) • However, after XT arrives, the memory is extended to 1 MB. That is 16 times bigger than the original • Segmentation: means the memory is divided virtually into several areas called Segment • The segment registers are 16 bit • The idea of the segmentation is NOT dividing 1 MB into 16 exact parts

Memorycont. • Interleaved: means that if we say the segment number 0, then we can access the memory 0 to 65536. Segment number 1 allows us to access memory number 16 to 65552. Segment 2 from 32 to 65568, and so on with the increment of 16 65568 Seg 2 32 65552 Seg 1 16 65536 Seg 0 0

Memory Interleaved • Why did they do that? It is for the sake of the operating system OS memory management stuff Therefore, OS align the executed code to the nearest 16 bytes alignment

Memory cont. • The memory access must be done in a pair of registers • The first is the segment register and next is any register, usually BX, DX, SI or DI • The register pair usually written like this: ES:DI with a colon between them • The pair is called the Segment:Offsetpair So, ES:DImeans that the segment part is addressed by ES, and the offset part is addressed by DI

Logical address Absolute or Physical address Memory cont. Example: • If the ES contains 1, and DI is 5, means that we access the memory 5. • If ES:DI = 0001:0005 then it actually access the actual address 21 (1 * 16 + 5 = 21) • So, 0000:0021 and 0001:0005 is actually the same address

Stacks • The stack (LIFO) is a temporary area to store temporary things • It is mainly used to pass the parameter value to procedures or functions • Sometimes, it also acts as temporary space to allocate for local variables. Therefore, the role of the stack is very important

Interrupts • Upon a request of an interrupt, the CPU usually stores context of running program, then it goes to the interrupt routine • After processing the interrupt, the processor restores all states stored and resume the program. There are 3 kinds of interrupts: • Hardware interrupts occurs if one of the hardware inside your computer needs immediate processing • Software interrupts occurs if the running program requests the program to be interrupted and do something else • CPU-generated interrupts occurs if the processor knows that is something wrong with the running code. (Divide a number with 0)

Why Assembly? • It's difficult • Error prone • Hard to debug • Takes a lot of time to develop

Why Assembly? However: • Assembly is fast. A LOT faster than any compiler of any language could ever produce • Assembly is a lot closer to machine level than any language because the commands of assembly language are mapped 1-1 to machine instructions • Assembly code is a lot smaller than any compiler of any language could ever produce • In Assembly, we can do a lot of things that we can't do in any higher level language

Notes • The assembly language is NOT case-sensitive • A comment in assembly begins with a semicolon (;). Everything after a semicolon until the end of the line is ignored

COM Structure ideal p286n model tiny codeseg org 100h jmp start ; your data and subroutine here start: mov ax, 4c00h int 21h end

Com Program Explanation • ideal says that we're using ideal syntax of TASM • p286n or .286 says that we're using 80286 processor instructions • model tiny or .model tiny says that we're using COM format • codeseg or .code says that this is the beginning of our code • org 100h • COM programs are almost always begin with a jump, i.e. jump to the beginning of the code. Between the jump and the beginning of your code, you place your variables here. The jump is denoted by the word jmp and followed with a label (here we call it start) • After the label start, the next two lines is just the code to terminate your program • end or .end entry specify the end point of your program

Making Labels • Put any name and stick it with a colon (:) • Label usually serves as a tag of where you'd like to jump and so on • You have to pick unique names for each label, otherwise the assembler will fail • There is a way to make it local: to prefix it with a @@ in front of the label name and still end it with a colon

Variables in Assembly

Variables Declaration • Our ideal syntax (TASM based) looks like this: Ideal p286n model tiny codeseg org 100h jmp start ; your data and subroutine here (this is a comment) start: mov ax, 4c00h int 21h end • Put variable declarations after the jmp start statement.

Variables Declaration : bits db 101001b var2 dw 4567h var3 dw 0BABEh : • There are 3 main types of variable declarations in assembly: • db is to declare the 1-byte-length • dw is for the word (2 bytes) • dd is for the double-word (4 bytes) • The declaration syntax is as follows: var_name db value Ideal P286n model tiny Codeseg org 100h jmp start score db 100 year dw 2001 money dd 1000000 start: mov ax, 4c00h int 21h end

Variables Declaration cont. • Variable Limits and Negative Values • You can assign the variables as negative values, too. However, assembler will convert them to the corresponding 2’s complement value. For example: If you assign -1 to a db variable, assembler will convert it to 255 integer 2’s Complement

Moving Around Values • If you need to do some calculations or commands involving the variables you'll have to load the variable values to the registers • The syntax of the mov command is: mov a , b which means assign b to a Var1 Var2 MM Reg 1 mov ax, [var2] mov [var1],ax Reg 2

Moving Around Values: example : jmp start our_var dw 10 start: mov bx, [our_var] mov cx, bx mov [our_var], cx mov ax, 4c00h int 21h end The square brackets [ ] are to distinguish the variable from its address

Moving Around Values cont. • When we deal with byte variables (i.e. db), we need to use byte registers (e.g. AL, AH, BL, BH, and so on) to do our bidding • AX, BX, CX, DX, and so on are word registers • You can use double-word registers which is available in 80386 processors or better (use p386n instead of p286n to enable double-word registers) • The double-word registers includes EAX, EBX, ECX, EDX, and so on

Moving Around Values cont. • We can assign variables with constants with mov instruction. However, this will work only with 80286 or better processors: mov [word ptr our_var], 1 Notice the word ptr modifier must be used when you assign constants to variables. Since our_var is a word variable, we need to use word ptr modifier Likewise, byte variable uses byte ptr modifier and double-word variable uses dword ptr

Moving Around Values example Notice the way that Intel assembler store a word value It stores the least significant byte first, then the most significant byte later

Big-endian & Little-endian • Describe the order in which a sequence of bytes is stored in a computer’s memory • In a big-endian system, the most significant value in the sequence is stored at the lowest storage address (i.e., first) • In a little-endian system, the least significant value in the sequence is stored first

Moving Around Values cont. • Recall that variables in assembly are treated as addresses AX  0502h

Moving Around Values cont. • Double-word variables are also stored similarly my_var dd 1234BABEh

Impacts on Registers • Recall that the word register AX consists of AH and AL • Modifying either AH or AL will modify the contents of AX • Likewise, modifying AX will be likely modify AH and AL

Question Marks on Variables • If you are not certain about the default value of a variable you can give a question mark ("?") instead. For example: another_var dw ? String Variables • You can define strings variables in assembly. It is as follows: message db "Hello World!$" String variables are required to be stored as db variables. The string is then surrounded by quotes, either single or double, up to you

String Variables • Why do we have to end our string with a dollar sign ("$")? • Each characters of the string is converted to its corresponding ASCII code message db "Hello World!$"

Multi-Valued Variables • The variables defined as db means each value is defined as bytes • However, there is no restriction on how many values we can define for each variable names multivar db 12h, 34h, 56h, 78h, 00h, 11h, 22h, 00h

Multi-Valued Variables • So multi valued variables are stored contiguously multivar2 dw 1234h, 5678h, 0011h, 2200h

Using dup • Another way to declare a multi-valued variables are using dup command: my_array db 5 dup (00h) That example above is similar to: my_array db 00h, 00h, 00h, 00h, 00h dup is kind of shortcut to define variables with the same values • Of course you can define something like this: bar_array db 10 dup (?)

Arithmetic Instructions

Addition & Subtraction

Addition & Subtraction • You may actually add or subtract variables with constants. But don't forget to add the wordptr or dword ptr as appropriate • If the result of an addition overflows, the carry flag is set to 1, otherwise it is 0 • Similarly, if the result of subtraction requires a borrow, then the carry flag is also set to 1, otherwise it is 0

Understanding X86 Assembly Language Registers and Flags