130 likes | 299 Views
COMP 2003: Assembly Language and Digital Logic. Chapter 6: Becoming the Machine Notes by Neil Dickson. Machine Code. The CPU doesn’t understand text Need a concise way of representing an instruction such that it is easy (fast) for the CPU to determine what to do
E N D
COMP 2003:Assembly Language and Digital Logic Chapter 6: Becoming the Machine Notes by Neil Dickson
Machine Code • The CPU doesn’t understand text • Need a concise way of representing an instruction such that it is easy (fast) for the CPU to determine what to do • This representation is called machine code
Example of Machine Code address encoding source code 000001F4 F7 E3 mulebx 000001F6 BB 00000000 movebx,0 000001FB NextPixel: 000001FB 3B D8 cmpebx,eax 000001FD 73 0C jaeDone 000001FF C7 04 99 movdword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 addebx,1 00000209 EB F0 jmpNextPixel 0000020B Done: 0000020B C3 ret 0000020C Notice that the line labels take up no space. They are just names for addresses.
Example of Machine Code address encoding source code 000001F4 F7 E3 mulebx 000001F6 BB 00000000 movebx,0 000001FB NextPixel: 000001FB 3B D8 cmpebx,eax 000001FD 73 0C jaeDone 000001FF C7 04 99 movdword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 addebx,1 00000209 EB F0 jmpNextPixel 0000020B Done: 0000020B C3 ret 0000020C +2 +5 +2 +2 +7 +3 +2 +1 Notice that the increase in address is the size of the instruction.
x86 Instruction Machine Code prefix(es) REX prefix opcode mod-reg-r/m SIB offset immediate • opcode: • main indication of what the instruction is; looked up in an opcode map • the only part present in all instructions • may be multiple bytes, or use reg in mod-reg-r/m if only one operand or an immediate • mod-reg-r/m byte: • mod (high 2 bits): 0 = r/m is memory & no offset; 1 = memory & 8-bit offset;2 = memory & 32-bit offset; 3 = r/m is a register • reg (middle 3 bits): specifies the register (eax=0 to edi=7) used as the register operand • r/m (low 3 bits): if mod=3, specifies the other register used as an operand, else specifies an addressing register • scale-index-base byte: allows 2 addressing registers; present iff mod≠3 and r/m=4 (esp) • scale (high 2 bits): power of two by which to multiply the index register(0reg; 1reg*2; 2reg*4; 3reg*4) • index (middle 3 bits): addressing register to be multiplied by 2scale • base (low 3 bits): addressing register not to be multiplied • only esp used for addressing if index=4 (esp) and base=4 (esp) • prefixes: most common prefix is 66h, which changes the operand size from dwords to words
Register Numbers 0 eax 0 ax 0 al 1 ecx 1 cx 1 cl 2 edx 2 dx 2 dl 3 ebx 3 bx 3 bl 4 esp 4 sp 4 ah 5 ebp 5 bp 5 ch 6 esi 6 si 6 dh 7 edi 7 di 7 bh Let’s look back at our example code
Decoding Machine Code 11011000 mod=11=3=both registers; reg=011=3=ebx; r/m=000=0=eax address encoding source code 000001F4 F7 E3 mulebx 000001F6 BB 00000000 movebx,0 000001FB NextPixel: 000001FB 3B D8 cmpebx,eax 000001FD 73 0C jaeDone 000001FF C7 04 99 movdword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 addebx,1 00000209 EB F0 jmpNextPixel 0000020B Done: 0000020B C3 ret 0000020C opcode map says that: 3B cmp register,dword ptrregister/memory & followed by mod-reg-r/m
Decoding Machine Code address encoding source code 000001F4 F7 E3 mulebx 000001F6 BB 00000000 movebx,0 000001FB NextPixel: 000001FB 3B D8 cmpebx,eax 000001FD 73 0C jaeDone 000001FF C7 04 99 movdword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 addebx,1 00000209 EB F0 jmpNextPixel 0000020B Done: 0000020B C3 ret 0000020C opcode map says that BB movebx,constant& followed by 32-bit constant opcode map says that C3 ret
Decoding Machine Code 00000100 mod=0=no offset; reg is ignored; r/m=4=followed by SIB address encoding source code 000001F4 F7 E3 mulebx 000001F6 BB 00000000 movebx,0 000001FB NextPixel: 000001FB 3B D8 cmpebx,eax 000001FD 73 0C jaeDone 000001FF C7 04 99 movdword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 addebx,1 00000209 EB F0 jmpNextPixel 0000020B Done: 0000020B C3 ret 0000020C 10011001 scale=2=index*4; index=3=ebx; base=1=ecx opcode map says that C7 mov dword ptrregister/memory,constant & followed by mod-reg-r/m & 32-bit constant at the end
Decoding Machine Code 11100011 mod=3=register; reg=4=mul in opcode map; r/m=3=ebx address encoding source code 000001F4 F7 E3 mulebx 000001F6 BB 00000000 movebx,0 000001FB NextPixel: 000001FB 3B D8 cmpebx,eax 000001FD 73 0C jaeDone 000001FF C7 04 99 movdword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 addebx,1 00000209 EB F0 jmpNextPixel 0000020B Done: 0000020B C3 ret 0000020C opcode map says that F7 ??? dword ptrregister/memory & followed by mod-reg-r/m where reg specifies the operation (from not, neg, mul, div, ...) similar for the add instruction
What about jumps and calls? • Opcode indicates that it is a jump or call and the condition (if conditional jump) • Opcode is followed by a signed constant that is the number to add to eip if the condition is met • i.e. jumps and calls are relative to the following instruction because eip contains the address of the following instruction
Decoding Machine Code Jumps 000001FF (address of following instruction) + 0C = 0000020B, address of Done address encoding source code 000001F4 F7 E3 mulebx 000001F6 BB 00000000 movebx,0 000001FB NextPixel: 000001FB 3B D8 cmpebx,eax 000001FD 73 0C jaeDone 000001FF C7 04 99 movdword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 addebx,1 00000209 EB F0 jmpNextPixel 0000020B Done: 0000020B C3 ret 0000020C opcode map says that 73 jaeLineLabel & followed by 8-bit signed relative address of LineLabel
Decoding Machine Code Jumps 0000020B (address of following instruction) + FFFFFFF0 = 0000020B + (-10) = 000001FB, address of NextPixel sign-extended address encoding source code 000001F4 F7 E3 mulebx 000001F6 BB 00000000 movebx,0 000001FB NextPixel: 000001FB 3B D8 cmpebx,eax 000001FD 73 0C jaeDone 000001FF C7 04 99 movdword ptr [ecx+ebx*4],00080FFh 000080FF 00000206 83 C3 01 addebx,1 00000209 EB F0 jmpNextPixel 0000020B Done: 0000020B C3 ret 0000020C Note: Jumps beyond -128 bytes or +127 bytes and all calls have a 32-bit relative address instead. opcode map says that EB jmpLineLabel & followed by 8-bit signed relative address of LineLabel