270 likes | 275 Views
A comprehensive introduction to assembly language programming, covering topics such as data representation, software hierarchy levels, machine language, and more. Learn the fundamentals of assembly language and how it translates high-level language code into machine language.
E N D
Introduction Chapter 1 What is Assembly Language? Data Representation
What is Assembly Language? • A low-levelprocessor-specific programming language design to match the processor’s machine instruction set • each assembly language instruction matches exactly one machine language instruction • we study here Intel’s 80x86 (and Pentiums)
Why learn Assembly Language? • To learn how high-level language code gets translated into machine language • i.e.: to learn the details hidden in HLL code • To learn the computer’s hardware • by direct access to memory, video controller, sound card, keyboard… • To speed up applications • direct access to hardware (ex: writing directly to I/O ports instead of doing a system call) • good ASM code is faster and smaller: rewrite in ASM the critical areas of code
Assembly Language Applications • Application programs are rarely written completely in assembly language • only time-critical parts are written in ASM • Ex: an interface subroutine (called from HLL programs) is written in ASM for direct hardware access • Ex2: device drivers (called from the OS) • ASM often used for embedded systems (programs stored in PROM chips) • computer cartridge games, microcontrollers (automobiles, industrial plants...), telecommunication equipment… • Very fast and compact but processor-specific
Table 2. Comparison of Assembly Language and High-Level Languages
Machine Language • An assembler is a program that converts ASM code into machine language code: • mov al,5 (Assembly Language) • 1011000000000101 (Machine Language) • most significant byte is the opcode for “move into register AL” • the least significant byte is for the operand “5” • Directly programming in machine language offers no advantage (over Assembly)...
Binary Numbers/Storage Size • are used to store both code and data • On Intel’s x86: • byte = 8 bits (smallest addressable unit) • word = 2 bytes • doubleword = 2 words • quadword = 2 doublewords
Data Representation • Even if we know that a block of memory contains data, to obtain its value we need to choose an interpretation • Ex: memory content “0100 0001” can either represent: • the number 2^{6} + 1 = 65 • or the ASCII code of character “A”
Data Representation • Number Systems • Binary/Octal/Decimal/Hexadecimal • Converting between various number systems • Signed/Unsigned Interpretation • Two’s Complement • Addition/Subtraction • Character Storage
Number Systems • A written number is meaningful only with respect to a base • To tell the assembler which base we use: • Hexadecimal 25 is written as 25h • Octal 25 is written as 25o or 25q • Binary 1010 is written as 1010b • Decimal 1010 is written as 1010 or 1010d • You are supposed to know how to convert from one base to another (see appendix A)
Binary Numbers • Digits are 1 and 0 • 1 = true • 0 = false • MSB – most significant bit • LSB – least significant bit • Bit numbering:
Converting between various number systems • Converting Binary to Decimal • Converting Decimal to Binary • Converting Binary to Hexadecimal • Converting Hexadecimal to Decimal
Signed and Unsigned Interpretation • When a memory block contains a number, to obtain its value we must choose either: • the signed interpretation: in that case the most significant bit (msb) represents the sign • Positive number (or zero) if msb = 0 • Negative number if msb = 1 • the unsigned interpretation: in that case all the bits are used to represent a magnitude (ie: positive number, or zero)
Signed Integers The highest bit indicates the sign. 1 = negative, 0 = positive If the highest digit of a hexadecimal integer is > 7, the value is negative. Examples: 8A, C5, A2, 9D
Two’s Complement Notation • Used to represent negative numbers • The twos complement of a positive number X, denoted by NEG(X), is obtained by complementing all its bits and adding +1 • NEG(X) = NOT(X) + 1 • Ex: NEG(10) = NOT(10) + 1 • = NOT(0000 1010b) + 1 • = (1111 0101b) + 1 = 1111 0110b = NEG(10) = -10 • It follows that X + NEG(X) = 0
Forming the Two's Complement • Negative numbers are stored in two's complement notation • Represents the additive Inverse Note that 00000001 + 11111111 = 00000000
Binary Subtraction • To perform the difference X - Y: • the machine executes the addition X + NEG(Y) 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 – 0 0 0 0 0 0 1 1 + 1 1 1 1 1 1 0 1 0 0 0 0 1 0 0 1 Practice: Subtract 0101 from 1001.
Maximum and Minimum Values • The msb of a signed number is used for its sign • fewer bits are left for its magnitude • Ex: for a signed byte • smallest positive = 0000 0000b • largest positive = 0111 1111b = 127 • largest negative = -1 = 1111 1111b • smallest negative = 1000 0000b = -128
Ranges of Unsigned Integers Standard sizes: What is the largest unsigned integer that may be stored in 20 bits?
Ranges of Signed Integers The highest bit is reserved for the sign. This limits the range: Practice: What is the largest positive value that may be stored in 20 bits?
Signed/Unsigned Interpretation (again) • To obtain the value of a number we need to chose an interpretation • Ex: memory content 1111 1111 can either represent: • -1 if a signed interpretation is used • 255 if an unsigned interpretation is used • Only the programmer can provide an interpretation of the content of memory
Character Storage Systems • Character sets • Standard ASCII (0 – 127) • Extended ASCII (0 – 255) • ANSI (0 – 255) • Unicode (0 – 65,535) • Null-terminated String • Array of characters followed by a null byte
ASCII vs Extended ASCII • The ASCII code (from 00h to 7Fh) • Only codes from 20h to 7Eh represent printable characters. The rest are control codes (used for printing, transmission…). • Extended ASCII character set (codes 80h to FFh) • Varies from one system to another • MS-DOS usage: for accentuated characters, Greek symbols and some graphic characters
The ASCII character set • CR = “carriage return” (MSDOS: move to beginning of line) • LF = “line feed” (MSDOS: move directly one line below) • SPC = “blank space”
Text Files • These are files containing only ASCII characters • But different conventions are used for indicating an “end-of line” • MS-DOS: <CR>+<LF> • UNIX: <LF> • MAC: <CR> • This is at the origin of many problems encountered during transfers of text files from one system to another
Strings and numbers • A strings is stored as an array of characters • A 1-byte ASCII code is stored for each char • Hence, we can either store the number 123 in numerical form or as the string “123” • The string form is best for display • The numerical form is best for computations