430 likes | 549 Views
C Programming and Assembly Language. Janakiraman V – jramaanv@yahoo.com NITK Surathkal 2 nd August 2014. Motivation. Do you know how all this is implemented in assembly?. Agenda. Brief introduction to the 8086 processor architecture Describe commonly used assembly instructions
E N D
C Programming and Assembly Language Janakiraman V – jramaanv@yahoo.com NITK Surathkal 2nd August 2014
Motivation Do you know how all this is implemented in assembly?
Agenda • Brief introduction to the 8086 processor architecture • Describe commonly used assembly instructions • Use of stack and related instructions • Translate high level function calls into low level assembly language • Familiarize the calling conventions • Explain how variables are passed and accessed
8086 Architecture • ALU – Arithmetic and Logical unit – The heart of the processor • Control Unit – Decodes instructions, Controls the execution flow • Registers – Implicit memory locations within the processor • Registers – Serve as arguments to most operations • Flags – All ALU operations will set particular bits after execution
Registers • EAX – Stores integer return values • ECX – Stores the counters for loops and also stores “THIS” pointer • EIP –Instruction pointer. Stores the address of the next instruction to be executed • ESP – The Stack pointer. Implicitly changed during Call/ Ret instructions. • EBP – Base pointer. Used to access local variables and function parameters.
Registers Contd… • EBX – A general purpose register • ESI– The source index register for string instructions • EDI - The destination index registers for string instructions • EFL – Flag register. Stores the flag bits of various flags like Carry, Zero, etc. • Segment registers point to a segment of memory. EDS, ESS, EES, ECS • EDX – Stores high 32 bits of 64 bit values
Instruction Set • Data transfer • Arithmetic and logical • Stack Operations • Branching and Looping • Function calls • String Instructions • Prefix to instructions
Data transfer instructions • MOV Destination, Source - Format • Data transfer is always from RIGHT to LEFT. • Source Register is unaffected. • LEA – Load effective address. • Loads the offset Address of the specified variable into the destination. • Equivalent of int y = &x;
Arithmetic and Logical instructions • Operation destination, source – Format • ADD AX, BX • SUB AX, [BX] • OR AX, [BX+4] • XOR AX, AX – Fastest way to clear registers
Exercise 1 int x=4, y=6, a=3, b=2; __asm { MOV EAX, x MUL y ADD EAX, a SUB EAX, b MOV EBX, x XOR EBX, y MOV ECX, a AND ECX, b OR EBX, ECX } • Write an assembly program to evaluate the following expression. (All variables are 32 bit integers) • EAX = x*y + a – b • EBX =( x^y) | ( a&b)
Branching and Looping • JMP Addr – Loads EIP with Addr • Conditional Jumps • Transfers control based on a condition • Based on state of one or more flags • ALU operation sets flags
Exercise 2 Multiplication by repeated addition. int x =9, y=10, z=0; __asm { XOR EAX, EAX MOV EBX, y MULT: ADD EAX, x DEC EBX JNZ MULT MOV z, EAX } String length of a constant string char* pChar = “Test data"; MOV EDI, pChar XOR ECX, ECX COMPARE: CMP [EDI], 0 JNZ INCREASE JMP DONE INCREASE: INC ECX INC EDI JMP COMPARE DONE: MOV len, ECX • Write an assembly program to evaluate the expression “ z = x * y ”using • Repeated addition • MUL instruction • Write an assembly program to calculate the string length of a constant string
Stack Operations • PUSH: PUSH EAX • ESP decreases by 4/ 2/ 1 • Data is moved on to top of stack • Used extensively to pass parameters to functions. • POP: POP EAX • ESP increases 4/ 2/ 1 • Data is copied to the destination • Compliment of PUSH
Exercise 3 • Write an assembly program to swap two integers x and y. • Write a C program to swap two numbers using a function Swap(int* pX, int* pY). Implement the Swap function directly in assembly language Swap two integers. int x=4, y=5; __asm { PUSH x PUSH y POP x POP y } Function to swap variables void swap(int* pX, int* pY) { __asm { MOV EAX, pX MOV EBX, pY PUSH DWORD PTR [EAX] PUSH DWORD PTR [EBX] POP DWROD PTR [EAX] POP DWORD PTR [EBX] } }
Function calls • CALL – CALL ADDR • Used for function calls. • Implicitly pushes the EIP on to the stack. • Reads the address specified (ADDR) and loads EIP with ADDR. • RET – RET n • Used to return to the calling function. • Implicitly pops the DWORD on the TOS into EIP. • ‘n’ Specifies the number to be added to ESP after returning. Used for stack clean up.
int g_iVar = 5; void main() { int z=0; z = Fn(2,4); g_iVar = z; } int Fn(int x, int y) { int z=0; z = x+ y return z; } Compile the C program!!
C and assembly language - FAQ • How are function calls in ‘C’ translated into assembly? • How are parameters passed to the function? • What does it mean to say local variables are stored on stack? Scope of local variables! • How are global variables accessed?
C and Assembly language Contd…. • Cannot pass many parameters in registers • Scope – Desirable feature • Stack – Ideal to store local variables • ESP cannot be used to access the local variables • EBP is used to access them!!!
Parameters, Local and Global variables • Before a function is called parameters are pushed onto stack • Parameters are accessed by [EBP +n] • Local variables are accessed by [EBP –n] • Integers are returned in EAX • Global variables are accessed by direct address values
void main() { int z=0; MOV z, 0 z = Fn(2,4); PUSH 0x00000004 PUSH 0x00000002 CALL Fn MOV z, EAX g_iVal = z; MOV [g_iVar], EAX } int Fn(int x, int y) { int z=0; MOV z, 0 z = x+ y; MOV EAX, x ADD EAX, y MOV z, EAX return z; RET } Compile the C program Contd…
CODE SEGMENT – Function – main() . int z = 0; C100 MOV [EBP-4], 0 z = Fn(2,4); C101 PUSH 0x00000004 C102 PUSH 0x00000002 C103 Call C200 C104 MOV [EBP-4], EAX g_iVar = z; C105 MOV [g_iVar], EAX . . STACK SEGMENT Compile the C Program Contd…. ESP ESP ESP ESP ESP EBP
CODE SEGMENT – Function – Fn() C200 MOV EBP, ESP C201 SUB ESP, 0x40 int z=0; C202 MOV [EBP-4], 0 z = x+ y C203 MOV EAX, [EBP+4] C204 ADD EAX, [EBP+8] C205 MOV [EBP-4], EAX return z; C206 ADD ESP, 0x40 C206 RET STACK SEGMENT Compile the C Program Contd…. ESP Local variable space Z ESP ESP EBP EBP
CODE SEGMENT – Function – main() . int z = 0; C100 MOV [EBP-4], 0 z = Fn(2,4); C101 PUSH 0x00000004 C102 PUSH 0x00000002 C103 Call C200 C104 MOV [EBP-4], EAX g_iVar = z; C105 MOV [g_iVar], EAX C106 RET STACK SEGMENT Stack corruption!!!!! You have accessed the stack of the function “Fn()” You computer will now REBOOT!!!!! EBP ESP
CODE SEGMENT – Function – main() . int z = 0; C100 MOV [EBP-4], 0 z = Fn(2,4); C101 PUSH 0x00000004 C102 PUSH 0x00000002 C103 Call C200 C104 MOV [EBP-4], EAX g_iVar = z; C105 MOV [g_iVar], EAX . . STACK SEGMENT Compile the C Program Contd…. ESP ESP ESP ESP ESP EBP
CODE SEGMENT – Function – Fn() C200 PUSH EBP C202 MOV EBP, ESP C203 SUB ESP, 0x40 int z=0; C204 MOV [EBP-4], 0 z = x+ y C205 MOV EAX, [EBP+8] C206 ADD EAX, [EBP+12] C207 MOV [EBP-4], EAX return z; C208 ADD ESP, 0x40 C209 POP EBP C20A RET 8 STACK SEGMENT Compile the C Program Contd…. ESP Local variable space Z ESP ESP ESP ESP EBP EBP EBP
CODE SEGMENT – Function – main() . int z = 0; C100 MOV [EBP-4], 0 z = Fn(2,4); C101 PUSH 0x00000004 C102 PUSH 0x00000002 C103 Call C200 C104 MOV [EBP-4], EAX g_iVar = z; C105 MOV [g_iVar], EAX C106 Epilogue STACK SEGMENT ESP ESP ESP EBP
Function calls in C - Summary • Function call gets translated to CALL addr • Prologue • Store the current EBP on stack • Set up the stack - Initialize the EBP • Allocate space for local variables. • Execute the function accordingly • Epilogue • Set the ESP to its original value • Set the EBP back to its original value
Stack clean up • When? • Happens after returning from a function • Why? • Undo the effect of pushing parameters • How? • RET N or ADD ESP, N
void main() { int z = 0; z = Function(2, 4); } /*Contd……*/ Prologue MOV [EBP-4], 0 PUSH 0x00000004 PUSH 0x00000002 CALL Function MOV [EBP-4], EAX Epilogue Contd…… C Program Assembly Contd…
int Function(int a, int b) { int c=0; c = a + b; return c; } PUSH EBP MOV EBP, ESP --------- Prologue SUB ESP, N MOV [EBP-4], 0 MOV EAX, [EBP + 8] --- Body ADD EAX, [EBP+12] MOV [EBP-4], EAX ADD ESP, N POP EBP ----------------- Epilogue RET 8 C Program Assembly Contd…
Calling conventions • __cdecl • Default calling convention of C functions • Needed for variable argument list • Caller cleans the stack - ADD ESP, N instruction • __stdcall • Faster than the __cdecl call. • Callee cleans the stack - RET N instruction • Contd……
Back to Exercise 3 Write a C program to swap two numbers using a function Swap(int* pX, int* pY). Implement the Swap function directly in assembly language Function to swap variables void swap(int* pX, int* pY) { __asm { PUSH DWORD PTR [[EBP+4]] PUSH DWORD PTR [[EBP+8]] POP DWROD PTR [[EBP+4]] POP DWORD PTR [[EBP+8]] } } Function to swap variables void swap(int* pX, int* pY) { __asm { PUSH DWORD PTR [pX] PUSH DWORD PTR [pY] POP DWROD PTR [pX] POP DWORD PTR [pY] } } Function to swap variables void swap(int* pX, int* pY) { __asm { MOV DWORD PTREAX, [EBP+4] MOV DWORD PTREBX, [EBP+8] PUSH DWORD PTR [EAX] PUSH DWORD PTR [EBX] POP DWROD PTR [EAX] POP DWORD PTR [EBX] } } Double indirection is not a valid instruction
struct stTest { int x; int y; }; void FnTest(stTest* pSt) { pSt->x = 0; pSt->y = 1; } void main() { stTest obj; FnTest(&obj); } class clsTest { int x; int y; public: void FnTest() { x = 0; y=1; } }; void main() { clsTest obj; obj.FnTest(); } What about C++?
Calling convention Contd… • this call – The C++ calling convention • Behaves like the __cdecl call in most ways • This pointer is passed in the ECX register • Stores the this pointer in [EBP-4] location on stack
String Instructions • Uses ESI, EDI as its operands. • After the operation ESI and EDI are automatically Incremented/ Decremented depending on the direction flag. • Usually used with the Prefix instructions. • Very efficient for standard looping instructions.
Prefix to instructions • REP – REP MOVSB • Used to repeat instructions unconditionally • Implicitly decrements ECX by 1 after each execution • Stops once ECX = 0 • REPNE/ REPE – REPE SCASB • Used to repeat instructions conditionally • Implicitly decrements ECX by 1 after each execution • Stops once ECX = 0 or ZERO flag is set/ reset
Optimized C functions • Memcpy • Strlen • Memset