230 likes | 247 Views
Explore the basics of assembly language programming for Intel Pentium processors using the GNU/Linux environment. Understand the transition from high-level languages to low-level machine code.
E N D
What is Assembly Language? Introduction to the GNU/Linux assembler and linker for Intel Pentium processors
High-Level Language • Most programming nowdays is done using so-called “high-level” languages (such as FORTRAN, BASIC, COBOL, PASCAL, C, C++, JAVA, SCHEME, Lisp, ADA, etc.) • These languages deliberately “hide” from a programmer many details concerning HOW his problem actually will be solved by the underlying computing machinery
The BASIC language • Some languages allow programmers to forget about the computer completely • The language can express a computing problem with a few words of English, plus formulas familiar from high-school algebra • EXAMPLE PROBLEM: Compute 4 plus 5
The example in BASIC • LET X = 4 • LET Y = 5 • LET Z = X + Y • PRINT X, “+”, Y, “=“, Z • END Output: 4 + 5 = 9
The C language • Other high-level languages do require a small amount of awareness by the program author of how a computation is going to be processed • For example, that: - the main program will get “linked” with a “library” of other special-purpose subroutines - instructions and data will get placed into separate sections of the machine’s memory - variables and constants get treated differently - data items have specific space requirements
Same example: written in C #include <stdio.h> // needed for printf() int x = 4, y = 5; // initialized variables int z; // unitialized variable int main() { z = x + y; printf( “%d + %d = %d \n”, x, y, z ); }
“ends” versus “means” • Key point: high-level languages let programmers focus attention on the problem to be solved, and not spend effort thinking about details of “how” a particular piece of electrical machiney is going to carry out some desired computation • Key benefit: their problem gets solved sooner (because their program can be written faster) • Programmers don’t have to know very much about how a digital computer actually works
computer scientist vs. programmer • But computer scientists DO want to know how computers actually work: -- so we can fix computers if they break -- so we can use the optimum algorithm -- so we can predict computer behavior -- so we can devise faster computers -- so we can build cheaper computers -- so we can pick one suited to a problem
A machine’s own language • For understanding how computers work, we need familiarity with the computer’s own language (called “machine language”) • It’s LOW-LEVEL language (very detailed) • It is specific to a machine’s “architecture” • It is a language “spoken” using voltages • Humans represent it with zeros and ones
Example of machine-language Here’s what a program-fragment looks like: 10100001 10111100 10010011 00000100 00001000 00000011 00000101 11000000 10010011 00000100 00001000 10100011 11000000 10010100 00000100 00001000 It means: z = x + y;
Incomprehensible? • Though possible, it is extremely difficult, tedious (and error-prone) for humans to read and write “raw” machine-language • When unavoidable, a special notation can help (called hexadecimal representation): A1 BC 93 04 08 03 05 C0 93 04 08 A3 C0 94 04 08 • But still this looks rather meaningless!
Hence: assembly language • There are two key ideas: -- mnemonic opcodes: we employ abbreviations of English language words to denote operations -- symbolic addresses: we invent “meaningful” names for memory storage locations we need • These make machine-language understandable to humans – if they know their machine’s design • Let’s see our example-program, rewritten using actual “assembly language” for Intel’s Pentium
Simplified Block Diagram Central Processing Unit Main Memory system bus I/O device I/O device I/O device I/O device
Pentium’s internal “registers” • Four general-purpose registers: eax, ebx, ecx, edx • Four memory-addressing registers: esp, ebp, esi, edi • Six memory-segment registers: cs, ds, es, fs, gs, ss • The instruction-pointer and flags registers: eip, eflags
The “Fetch-Execute” Cycle main memory central processor Temporary Storage (STACK) ESP Program Variables (DATA) EAX EAX EAX EAX Program Instructions (TEXT) EIP the system bus
our program’s ‘data’ section .section .data x: .int 4 y: .int 5 .comm z, 4 fmt: .string “%d + %d = %d \n”
our program’s ‘text’ section .section .text main: # comment: assign z = x + y movl x, %eax addl y, %eax movl %eax, z
‘text’ section (continued) # comment: print the program results pushl z pushl y pushl x pushl $fmt call printf addl $16, %esp
‘text’ section (concluded) # comment: return control to the caller ret # comment: make label visible to linker .global main
program translation steps demo.s demo.o program source module demo program object module assembly linking the executable program object module library object module library object module library
The GNU Assembler • With Linux you get free software tools for compiling your own computer programs • An assembler (named ‘as’): it translates assembly language (called the ‘source code’) into machine language (called the ‘object code’) $ as demo.s -o demo.o • A linker (named ‘ld’): it combines ‘object’ files with function libraries (if you know which ones) • A C compiler (named ‘gcc’) which invokes both: $ gcc demo.s -o demo
What must programmer know? • Needed to use CPU register-names (eax) • Needed to know space requirements (int) • Needed to know how stack works (pushl) • Needed to make symbol global (for linker) • Needed to understand how to quit (ret) • And of course how to use system tools: (e.g., text-editor, assembler, and linker)
Summary • High-level programming (offers easy and speedy real-world problem-solving) • Low-level programming (offers knowledge and power in utilizing machine capabilities) • High-level language hides lots of details • Low-level language reveals the workings • High-level programs: easily ‘portable’ • Low-level programs: tied to specific CPU