160 likes | 300 Views
Introduction. Chapter 1 What is Assembly Language? Data Representation. What is Assembly Language?. A low-level processor-specific programming language design to match the processor’s machine instruction set
E N D
Introduction Chapter 1 What is Assembly Language? Data Representation
What is Assembly Language? • A low-levelprocessor-specific programming language design to match the processor’s machine instruction set • each assembly language instruction matches exactly one machine language instruction • we study here Intel’s 80x86 (and Pentiums)
Why learn Assembly Language? • To learn how high-level language code gets translated into machine language • ie: to learn the details hidden in HLL code • To learn the computer’s hardware • by direct access to memory, video controller, sound card, keyboard… • To speed up applications • direct access to hardware (ex: writing directly to I/O ports instead of doing a system call) • good ASM code is faster and smaller: rewrite in ASM the critical areas of code
Assembly Language Applications • Application programs are rarely written completely in assembly language • only time-critical parts are written in ASM • Ex: an interface subroutine (called from HLL programs) is written in ASM for direct hardware access • Ex2: device drivers (called from the OS) • ASM often used for embedded systems programs stored in PROM chips • computer cartridge games, microcontrollers (automobiles, industrial plants...), telecommunication equipment… • Very fast and compact but processor-specific
Machine Language • An assembler is a program that converts ASM code into machine language code: • mov al,5 (Assembly Language) • 1011000000000101 (Machine Language) • most significant byte is the opcode for “move into register AL” • the least significant byte is for the operand “5” • Directly programming in machine language offers no advantage (over Assembly)...
Binary Numbers • are used to store both code and data • On Intel’s x86: • byte = 8 bits (smallest addressable unit) • word = 2 bytes • doubleword = 2 words • quadword = 2 doublewords
Number Systems • A written number is meaningful only with respect to a base • To tell the assembler which base we use: • Hexadecimal 25 is written as 25h • Octal 25 is written as 25o or 25q • Binary 1010 is written as 1010b • Decimal 1010 is written as 1010 or 1010d • You are supposed to know how to convert from one base to another (see appendix A)
Data Representation • Even if we know that a block of memory contains data, to obtain its value we need to choose an interpretation • Ex: memory content 0100 0001 can either represent: • the number 2^{6} + 1 = 65 • or the ASCII code of character “A”
Signed and Unsigned Interpretation • When a memory block contains a number, to obtain its value we must choose either: • the signed interpretation: in that case the most significant bit (msb) represents the sign • Positive number (or zero) if msb = 0 • Negative number if msb = 1 • the unsigned interpretation: in that case all the bits are used to represent a magnitude (ie: positive number, or zero)
Twos Complement Notation • Used to represent negative numbers • The twos complement of a positive number X, denoted by NEG(X), is obtained by complementing all its bits and adding +1 • NEG(X) = NOT(X) + 1 • Ex: NEG(10) = NOT(10) + 1 • = NOT(0000 1010b) + 1 • = (1111 0101b) + 1 = 1111 0110b = NEG(10) = -10 • It follows that X + NEG(X) = 0 • To perform the difference X - Y: • the machine executes the addition X + NEG(Y)
Maximum and Minimum Values • The msb of a signed number is used for its sign • fewer bits are left for its magnitude • Ex: for a signed byte • smallest positive = 0000 0000b • largest positive = 0111 1111b = 127 • largest negative = -1 = 1111 1111b • smallest negative = 1000 0000b = -128
Signed/Unsigned Interpretation (again) • To obtain the value of a number we need to chose an interpretation • Ex: memory content 1111 1111 can either represent: • -1 if a signed interpretation is used • 255 if an unsigned interpretation is used • Only the programmer can provide an interpretation of the content of memory
Character Representation • Each character is represented by a 7-bit code: the ASCII code (from 00h to 7Fh) • Only codes from 20h to 7Eh represent printable characters. The rest are control codes (used for printing, transmission…). • An extended character set is obtained by setting the msb to 1 (codes 80h to FFh) so that each character is stored in 1 byte • Varies from one system to another • MS-DOS usage: for accentuated characters, Greek symbols and some graphic characters
The ASCII character set • CR = “carriage return” (MSDOS: move to beginning of line) • LF = “line feed” (MSDOS: move directly one line below) • SPC = “blank space”
Text Files • These are files containing only ASCII characters • But different conventions are used for indicating an “end-of line” • MS-DOS: <CR>+<LF> • UNIX: <LF> • MAC: <CR> • This is at the origin of many problems encountered during transfers of text files from one system to another
Strings and numbers • A strings is stored as an array of characters • A 1-byte ASCII code is stored for each char • Hence, we can either store the number 123 in numerical form or as the string “123” • The string form is best for display • The numerical form is best for computations