720 likes | 769 Views
ECE 371 Microprocessors Chapter 5 x86 Assembly Language 1. Herbert G. Mayer, PSU Status 11/11/2015 For use at CCUT Fall 2015. Syllabus. Motivation 16-bit, 32-bit, 64-bit Processor Null Program Print Character Print String INT Function Assembler Abbreviations Macros Procedures
E N D
ECE 371 Microprocessors Chapter 5x86 Assembly Language 1 Herbert G. Mayer, PSU Status 11/11/2015 For use at CCUT Fall 2015
Syllabus • Motivation • 16-bit, 32-bit, 64-bit Processor • Null Program • Print Character • Print String • INT Function • Assembler Abbreviations • Macros • Procedures • Assembly and Linking • nasm Assembler • Summary • Appendix
Motivation • Almost impossible to communicate with a microprocessor on the binary level • Assembler offers abstraction, relocatability, and program reuse • Symbolic names permit convenient definition and reference of data and code objects • Assembler offers high level data and control constructs, similar to high-level languages • Assembler programming allows high level of control over the target machine • And achieves highest performance -for short code sections
Motivation • Intel x86 is the most widely used microprocessor for general computing; made by Intel and AMD • The ARM processor is most widely used processor for portable devices, e.g. tablets and cell phones • We use Intel x86 here to explain the relation of µP and assembly language; for any one µP, there may be many assemblers, but only a single binary code • The µParchitecture defines details of the assembler instructions; yet some assembly language detail is independent of architecture • E.g. the syntactic order in which operands are listed in assembly instructions is arbitrary, but the bits have to be assembled into their specific bit positions of a machine instruction
Motivation • Any machine instruction has its corresponding assembler syntax • Different manufacturers of an assembler may have different syntax rules for the same machine instructions • For example, some define the destination register to be situated in the leftmost position of the various defined operands; e.g. a load instruction for a hypothetical machine could be: ld r1, [foo] -- load word at address foo into reg r1 • Others might reverse the order, use different mnemonics, or name registers differently, such as: load foo, %r1 -- load word at address foo into reg r1
Motivation • Some manufacturers refer to moving bits from memory into a register as a load instruction (IBM); others as a move instruction (Intel) • Assembly Language bridges the gap between low level binary machine instructions andhigher level interface with human programmers • Binary instructions execution on a digital computer, while an assembler provides a tool of expressing programs in readable, text form, readable by programmers • Assembly language is by no means high-level in the sense of machine independent, structured, or object-oriented • It is a low level, target machine specific interface; but shields programmers from the tedium of binary code
Motivation • Users do not deal with the target machine in terms of bits that represent binary machine instructions • An assembler is a piece of system software that maps an assembly source program into binary instructions • Thus assembly language provides an abstraction: • It elevates the user to the level of textual language, up from the level of binary object code • Several, different assemblers may do this in syntactically different ways for the same target μP • Yet the generated binary code has to be identical for each assembler, in order to render the object code executable on the targeted μP
Motivation • Common to many architectures is the notion (and separation) of data space, instruction space, and perhaps other areas of program logic • The x86 architecture embodies so called data segments, code segments, stack segments, and numerous of these if needed • Each segment is identified at run time by a segment register • Offsets to specific data or code elements are identified by offsets from the start of their respective segment
Motivation • For example, the code label next: will be interpreted by the hardware as seg: offset, where seg is the segment register cs, and offset is the offset of nextfrom the start of the code segment • Let’s say the offset of next is 248x and the value in the cs register is 20030x, then the resulting run time (code) address is 200548x • Note the left-shift of the segment address by 4 bits • This is possible, and required, since all segments are required to be aligned at modulo-16 addresses on the Intel x86 architecture • Thus a segment’s starting address is always a multiple of 16, and its binary address would always have the rightmost (low-order) 4 bits 0
Motivation • This chapter introduces complete programs, written in assembly language • Starting with the smallest possible but complete assembly program, we progress to more sophisticated programs • One example emits a single character, the next prints a complete string onto the standard screen, followed by conventions that allow us to communicate with the assembler in an abbreviated way • We also discuss macros and simple procedures with calls and returns
16-Bit, 32-Bit, 64-bit Architecture • The Intel x86 processor started out as a 16-bit architecture in the late 1970s • The x86 product names was Intel 8086 µP • Then the x86 architecture grew to become a 32-bit architecture • The initial product name being Intel 80386; yes, there were preliminary versions, named 80186 and 80286, with very short lives • The 32-bit version was backwards compatible with the 16-bit architecture and could execute old code • Then in the early 2000s, since AMD had produced a 64-bit version of the x86 family, very much to the surprise of Intel, then Intel productized a 64-bit version as well, in addition to the new and different Itanium
16-Bit, 32-Bit, 64-bit Architecture • The AMD product name was AMD64 • Intel’s name: Intel 64 • Old 16-bit and 32-bit x86 code is compatible and executes without issue on the new 64 bit processors • Through not with optimal speed, as legacy object code cannot take advantage of new instructions that may speed up certain applications Photo of AMD64 µP
16-Bit, 32-Bit, 64-bit Architecture • AMD’s 64-bit version of the old x86 architecture must have sent shock waves through Intel, which at the time of AMD’s release had no published plans to release a 64-bit version of the old x86 machine • That quickly changed, as Intel had been smart enough, to have its skunk work design the new Intel 64-bit µP in secrecy • All 8 old registers were expanded to 64 bits, and the names modified correspondingly, to differentiate them from their 32-bit or 16-bit siblings • The old names, e.g. “eax” for the 32-bit version of the ax register, were modified to “rax”, for the 64-bit version of the ax register; the ax register has 16 bits • Intel added 8 more GPR to the register-starved architecture; these are known as rn, with n = 8..15
16-Bit, 32-Bit, 64-bit Architecture • The above register map also shows the XMM and MMX registers, directly usable on the new 64-bit architecture • The 8 MMX registers are 80-bits long for extended floating point computations, and 64-bits short, for regular floating-point computations; all adhering to the IEEE industry standard • The 16 XMM registers were already 128 bits long, that did not have to change in Intel 64 • The instruction pointer register ip simply became rip, and the flags register became rflags
16-Bit, 32-Bit, 64-bit Usage In assembly code below we use the following names for the ax register, depending on 16-bit, 32-bit, or 64-bit modes: • ax 16 bits; also al is the low order byte register • eax 32 bits • rax 64 bits Ditto with the other registers, for example, the bx: • bx 16 bits; also bh is the high orderbyte register • ebx 32 bits • rbx 64 bits Etc.
A Null Program In x86 Assembly Language
Null Program • Goal here is to craft an x86 assembly language program that assembles, links, loads and executes correctly, and then does nothing • Set up segments: code, data, and stack • Here only the Code Segment as the others are empty • Note the ’code’string to identify code segment • Communicate implied seg portion of seg:offset in assume instruction • Define start address (actually offset) via label, here label start: • Labels are user-defined identifiers, each followed by colon, in the code segment
Null Program ; Source: out1.asm ; Purpose: simplest program, no data seg, no stack code_s segment ’code’ ; ’code’ identifies segment assume cs:code_s ; implied seg register cs start: mov al, 0 ; termination code mov ah, 4ch ; to terminate: 4ch in ah int 21h ; call system sw for help code_s ends ; end of code segment end start ; end argument defines start
Null Program • Use manufacturer-provided assembler services: Here 4ch to terminate; the ‘h’ stands for ‘hexadecimal’ • Run-time services requested via INT 21h • Service refinement specified in register ah and possibly other registers that ‘h’ stands for ‘high’ byte • Return code is zero, meaning: no errors occurred • Note comments, introduced by ; • Comments end at the end of line • Can be different in different assemblers • Assembler used here could be Microsoft masm or ML
Print Single Character: We Choose ‘$’
Print Character ‘$’ • Goal to craft an x86 assembly language program that assembles, links, loads and executes a complete program for the purpose of printing a single character • Define also data and stack segment; though they will remain unused; just used for demonstration • Use assembler instruction to define data, here a single machine word, via dw: dw 999 ; reserves 1 word, initialize to 999 • And we define an array of 100 machine words, via the dup pseudo-opcode dup: 100 dup( 0 ) ; defines 100 words, initialize 0 ; remain unused in simple program
Print Character ‘$’ ; Source: out2.asm ; Purpose: simplest program to output a character, here ‘$’ data_s segment ; unused data segment dw 999 ; define a word, init 999 data_s ends stack_s segment ; unused stack segment dw 100 dup( 0 ) ; reserve 100 words, init 0 stack_s ends code_s segment 'code' ; THE Code Segment assume cs:code_s, ds:data_s start: mov ax, seg data_s ; initialize ds mov ds, ax ; cannot load directly into ds mov dl, '$' ; char to print assumed in dl mov ah, 2h ; call 2h emits char in dl int 21h ; call OS routine, e.g. DOS mov ax, 4c00h ; termination code in ah + al int 21h ; terminate finally via call code_s ends ; repeat seg name at ends end start ; say: Where to start
Print Character ‘$’ • Again a system routine is called for help: INT 21h • The specific argument, communicating which help is needed, must be passed in register ah • Value 2 in ah states character output is desired • OS service routine 2 prints a char; it outputs the one found in register dl; that is the ‘$’ character • Moving c400h into register ax is same as 4ch into register ah and 00h into al • Note that one of the h qualifiers says “hex”, while the other says “high” • c400h isjust two byte literals concatenated
Print String • Goal now is to craft an x86 assembly program that assembles, links, loads and executes a program to print a character string • The Data Segment defines a string of bytes, initialized to some string literal, identified by symbol msg • This name msg is a user-defined name for the byte address, where the string starts • Note the $ character to end a string literal • Used as end criterion for system SW routine 9 • Stack segment here is solely a dummy segment: • It holds 10 unused strings, each of length 16, solely for demonstration purposes
Print String ; Source: out3.asm ; Purpose: simplest program to output a character string data_s segment msg db "Hello CCUT class$" data_s ends stack_s segment ; unused db 10 dup( "---S t a c k----" ) stack_s ends ; repeat the name code_s segment 'code' assume cs:code_s, ds:data_s start: mov ax, seg data_s mov ds, ax mov dx, offset msg ; System SW prints mov ah, 9h ; sys call 9h emits string int 21h ; call OS routine mov ax, 4c00h ; term code in ah + al int 21h ; term finally via call code_s ends ; label seg name at ends end start ; start here!
Print String • System SW routine 9 emits character string to the standard output file; note 9 is same as 9h • Whose start address it finds in ds:offset, offset communicated in register dx • Note the built-in system-SW function offset applied to a data label, here label msg • System-SW also provides built-in seg pseudo-function to generate another part of the final address
INT Function • The x86 INT instruction is not what the computer sciences call an interrupt • Instead it a call to a low-level system SW routine • Parameterized by the single-byte argument residing in the ah register • The actual system SW being executed as a result of INT is dependent on the actual operating system on which the x86 code executes • Thus it may be different on a Linux system from a Windows environment and from a Unix target machine
Assembler Abbreviations • Assembler directive .mode small allows for certain default abbreviations and assumptions • For example data, code, stack, @data are predefined in Microsoft assemblers, as are assume statements • Here another string is printed, that string is “Hello” • Note again the $ terminator --Note the different meaning of $ in a different target system, e.g. $ means “current code address” in Linux • Under Microsoft assembler SW, the macro @data is predefined by ML (or masm), same as seg data • Note again offset function, to compute the byte distance from the start address of the segment
Assembler Abbreviations ; Source file: out4.asm ; Purpose: simpler program to output string .model small ; assumes stack data code .stack 10h ; assumes name: stack .data ; assumes name: data hi db "Hello$" .code ; assumes name: code start: mov ax, @data ; @data predefined macro mov ds, ax ; now data segment reg set mov dx, offset hi ; string 2 b output by System SW mov ah, ; System SW 9h emits string int 21h ; call System SW mov ax, 4c00h ; we want to terminate: ah + al int 21h ; terminate finally end start ; start here!
Assembler Abbreviations • Note again the System SW routine 9 under Microsoft system SW, to output some string of characters, whose at address is found in register dx • Program using .model small abbreviation is smaller, more compact, easier to read • The .code ends previous segment, if any (here data) • And starts code segment • The .data ends previous segment, if any • And starts the data segment
Macros • Programmers get tired of writing segment … ends • The .model small allows defaults and abbreviations • Macros make program source more readable, easier to maintain; here are the rules: • Macros can be defined anywhere in assembler source • The initial assembler translation process extracts all macro definitions, stores them during assembly time, and uses (expands) them, each time a macro name is found in the asm source • Macros are introduced by user defined name and the macro keyword • Terminated by endm keyword
Macros ; Source file: out5.asm ; Purpose: macro-ized program to output character string start macro ; no parameters mov ax, @data ; @data predefined macro mov ds, ax ; now data segment reg set endm ; end of start macro Put_Str macro Str ; one formal parameter, “Str” mov dx, offset Str; string 2 b output by DOS mov ah, 9h ; DOS call 9h emits string int 21h ; call system SW endm ; end of Put_Str macro Done macro ret_code ; formal parameter “ret_code” mov ah, 4ch ; we wanna terminate, ah = 4c mov al, ret_code ; communicate: all is o.k. int 21h ; terminate finally via DOS endm ; end of macro body of Done
Macros .model small ; allow predefines assumptions .stack 10h ; assumes segment name: stack .data ; assumes segment name: data hi db "Hello$" ; terminate string with $ .code ; assumes segment name: code main: start ; use of mcro “start” Put_Str hi ; invoke macro “Put_Str” with hi Done 0 ; use of macro “Done” end main ; start here!
Macros • Macros specify 0 or more formal macro parameters, which can be referenced in the macro body • At the place of macro definition, these parameters are named formal parameters • Formal parameters follow the macro keyword at the place of definition • At the place of use (the place where they are expanded) these are substituted by actual parameters • When macro name is used, its body is expanded in-line at that place, with all actual parameters taking the place of the formal ones
Assembler Procedures: Like High-Level Language Procedures
Procedures • Assembler procedure identified by proc and endp • Procedures can be called and provide a syntactic grouping mechanism to form physical modules containing logically connected actions • The Microsoft syntax rule for procedure names does not allow : as used for labels • Return instruction ret ends a procedure body and allows return to the place of call, immediately after the call instruction
Procedures ; Source file: out6.asm ; Purpose: modular macro program to output string start macro ; no parameters mov ax, @data ; @data predefined macro mov ds, ax ; now data segment reg set endm ; end of “start” macro body Put_Str macro Str ; “Str” must be data label .data ; assumes name: data hi db "Hello$" ; terminate string with $ .code ; assumes name: code main proc ; begin of procedure body start ; invoke “start” macro Put_Str hi ; invoke “Put_Str” with actual Done 0 ; invoke “Done” with actual 0 ret ; return main endp end main ; entry point is “main”
Procedures • Like in High-Level language programs, procedures are a key syntax tool to modularize • Physical modules (procedures) encapsulate data and actions that belong together • Physical modules –delineated by the proc and endp keywords) are the language tool to define such logical modules • Net result: programs that are easier to write, and above all, easier to read • A procedure example is provided in a separate handout
Assembly and Linking Of Full Programs
Assembly • Linking is the process of binding 2 or more pieces of software together in a way that they constitute one running program • Clearly the start address, where execution begins, must be defined, by convention • Typical tools to link include: • Microsoft Macro Assembler masm • Borland Macro Assembler tasm • Microsoft Macro Assembler ml • Microsoft Linker link • Borland Linker tlink
Assembly With MASM • The Microsoft macro assembler old version has the name masm • A newer assembler from Microsoft is named ml • This section explains the masm command briefly • The masm command in version 5.10 and older has 4 arguments, separated from one another by commas. These arguments are file names • Arguments are considered omitted, if no comma (and thus no file name) is given • The assembler prompts for each omitted one, so it is generally better to provide them, at least the commas, lest there will be repeated interaction with the assembler asking for file names, or hitting of carriage returns
Assembly With MASM • It is a nuisance in masm 5.10 that the last comma (the third one to separate 4 arguments) must be followed by another comma (or semicolon, indicating the end of a command line) • Else the assembler does not recognize that the defaultshould be used for the fourth argument • If commas without file names are given, then default file names are assumed • The four file names, which are the arguments of the masm command, are left to right:
Assembly With MASM assembly source program, e.g. source.asm object program generated by assembler, e.g. source.obj the listing, generated by the assembler, say source.lst; yes, in days of old, people actually created paper listings of programs being processed the cross-reference file, named source.crf
Assembly With MASM • Suffixes obj, lst, and crf are automatically generated by the assembler, if no other names are provided • Some complete masm commands, for the assembler file src1.asm would be: masm src1.asm, src.obj, src.lst, src.crf; no prompting masm src1,src1,src1,src1 ; no prompting masm src1,src1.obj,src1,src1.crf ; no prompting masm src1,,,; ; no prompting • In the above cases the assembler will not prompt you, because you provided all file names • It was smart enough to think up the suffixes (like .lst and .obj) from the respective positions
Assembly With MASM • Some incomplete masm commands for source file src2.asm, are shown next • The assembler will prompt the user for the missing ones: masm src2.asm, src2.obj; asks for: list, cross ref file (xref) masm src2,foo,src2 ; creates foo.obj, src2.lst, asks xref masm src2,,bar.lst ; creates src2.obj, bar.lst, asks xref masm src2 ; asks for object,list, cross ref file • Borland Macro Assembler tasm 5.10 • Similar to masm, but command is tasm