Understanding Instruction Set and Programming Languages for Microprocessors

Microprocessors and microcontrollers Digital technics review and programming last modified: 2016. Feb 25.

II.2. Instruction set (processor) • Instruction set: the set of instructions a processor knows in hardware • Machine code: a program containing instructions from the instruction set, in binary or hexadecimal format. It can generally be natively run by the processor (without the need for other software).

Instruction set, programming • Assembly: lowest level programming language, processor dependent. • It is made by assigning easy to remember words (mnemonics) to machine code instructions; make easier data and number formatting; make labels and constants available; make some simple functions

Instruction set, programming • Software originally written in other programming languages have to be converted to machine code using a compiler (in older times, by hand). For very high level languages it can involve several middle steps. • Question: in what language and machine are compilers written?

Programming languages

Low level languages • Nearest to hardware, • thus easiest to compile to machine code, • best for optimizing hardware resource usage. • Generally assembly languages are in this category.

Assembly • Practically a readable version of machine code • Takes lot of time and effort to realize complex functions (of course code parts can be reused) • Hard to overview, find bugs – needs lots of comment • Can directly access hardware, can utilize all special functions of hardware and special cpu/gpu instructions • Code can be made small in size, small in memory footprint, fast to run, optimized for hardware

Assembly • More suited for small programs, for low-end hardware, for microcontrollers and embedded systems with special needs • Eg. military or space uses – small memory, low speed, have to be very reliable • Compilers (esp.in older times) often in asm • Some of op.system’s special functions in asm • But possible to write whole graphical modern op system in asm (see eg. Menuet OS)

Medium level languages • eg. C (though often put in other categories) • developed before microprocessors, • developed for programming Unix operating system • integral part of Unix and Linux systems • often used to write opsystems and compilers • „mother” of many modern prog.languages • less special data types; pointers used for strings and arrays; can use pointers of pointers and chaines lists etc. • can change data type of an existing variable (without conversion) • allows to give value inside a condition (eg. if(a=b) and if(a==b) both are allowed!)

Higher level languages • In high level languages simple instructions can realize complex jobs • closer to human languages and logic • easier, more special data types • easier, faster to develop and debug, easier to overview • though sometimes finding a bug needs looking at the asm/machine code, if the bug is hardware-dependent

Higher level languages • High level languages: • Fortran, Algol, Cobol, Basic, Pascal, Python, Perl, PHP, Ruby, C#, Java,... • Easy to learn, to program, to overview • Modern version often give help to create graphical user interface (GUI) (Visual Basic, Visual C, etc) • Compiled code potentially larger and/or slower than if using C or asm • Often need „runtime engine” and larger hardware req. (eg. dotnet,java) • Less chance for syntax errors or algorithm errors, but harder to find errors related to compiler, opsys or hw.

Special high level languages • Windowed system development • Visual Basic / C / whatever • program's visual elements (windows, buttons etc.) are placed in a graphical UI; traditional program code is written for each element

Visual C#

Special high level languages • Dataflow programming • LabView, VEE, Simulink • program's visual elements created in GUI, the corresponding code is also created in a GUI in the form of a flowchart • Labwindows CVI • form of Labview, in which you can put traditional "typed" code

Labview

Special high level languages • Mathematical, logical • MatLab, Maple, Mathematica • R, S (statistics) • Coq (theorem proving), SML (?),Haskell (?)

Coq

Special high level languages • Scripting languages • for small, quick tasks • often built into op system (esp.Unix/Linux) • or built into / support markup language (PHP, Javascript, Flash, Java) • often started as simple script languages, but in the end, can do more than that (most of the above languages) • Markup and supporting languages (often not „real” programming, more like data formats) • HTML / CSS

Portability of source code Portable code • can be compiled (without modification or with little modification) on other hardware or op system • this needs an existing compiler for that hardware (needs not to be compiled on target HW (cross compilation) • different HW might not need exactly the same SW... (Win8 on PC and tablet ??)

Readability „Read-only programming language” (easy to read, hard to write) „Write only programming language” (easy to write, hard to read) And then this...

Programming languages • interpreted • compiled • to machine code • to bytecode • mixed (intermediate) runtime compiled

Operating system • Technically not necessary – programs can be run without it – but it is not convenient to do so • Opsys allow us to: • provide user interface (UI) to load stored programs easily (it needs not be graphical (GUI) – even a command line is a huge help vs the original computers) • provides application programming interface (API) – makes programming easier and portable

Operating system • provides a layer btw HW and user SW • when changing HW, only Driver software changed, but end-user SW sees the same API • API can be a library of a programming language or a function call the opsys provides • BIOS is „first part” of opsys (but is independent), it also provides some simple function calls • opsys can allow for multitasking – it has to keep track of running processes and allocate their cpu time; also helps allocating multi-processor jobs

Instruction set example • Intel 8085A • 8 bit data bus, 16 bit address bus • few MHz clock • its instruction set was the basis for the 8086 and then x86 series processors • it is also similar to many current 8b microcontrollers

8085 • Instruction types • arithmetics: add, subtract, increment • logics: and, or, xor, complement (negate) • bitwise: rotate (as in a shift register) • data moving: move, exchange, push,pop • branch: jump, call, return • conditional: jump on condition (jz,jnz,jc, etc) • IO: in, out

8085 addressing modes example • mov r1,r2 • 1 byte, 1 machine cycle, r2r1 • mov r,M • 1 byte, 2 m.cycle, (HL)r • HL register pair contains memory address • mov M,r • 1 byte, 2 m.cycle, r(HL) • HL register pair contains memory address • mvi r, data • 2 byte (code+data), 2 m.cycles, datar • mvi M,data • 2 byte (code+data), 3 m.cycle, data(HL) • lxi rp,data • 3 byte (code+2byte data), 3 m.cycle, datarp (register pair)

8085 addressing modes example • lda addr • 3 byte (code+2byte address), 4 m.cycle, (addr)A • memory cell’s content into accumulator (A) • sta addr • 3 byte, 4 m.cycle, A(addr) • lhdl addr • 3 byte, 5 m.cycle, (addr)HL • memory word (2B) at address addr into HL register pair • shld addr • 3 byte, 5 m.cycle, HL(addr) • ldax rp • 1 byte, 2m.cycle, (rp)A • memory cell addressed by BC or DE into A • stax rp • 1 byte, 2 m.cycle, A(rp) • xchg • 1 byte, 1 m.cycle, HL<=>DE • exchange contents of register pairs

Data formats

Number systems • Most computers store and process numbers in a binary form • Users mostly see decimal form, programmers use decimal, hexadecimal (base 16), or less frequently octal and binary forms

Number systems • Some notation examples: • Decimal: • D’223’ , 223D, 223D, 22310 • Hexadecimal: • H’2F’, h2F, 2Fh, 0x2F, 2F16 • Binary • B’10101100’, b10101100, 101011002

Number systems • Decimal to binary conversion: • Divide by two, repeat with int(result), read the remainders from bottom up (NB last remainder / first digit is always 1) • Binary to decimal conversion: • Sum up the powers of two where the binary digit is 1

Number systems Hexadecimal (base 16): 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F digits One digit can be translated to 4 bits, so an 8bit number (byte) will always be two hexa digits: 3Ah=0011 1010b

Integer numbers • Unsigned integer (uint) • Signed integer (int) • possible storing formats: • sign bit • two’s complement – this one is most popular • offset binary, excess-K • range in abs.value is halved

Sign bit format • first bit is sign • 01010011b=83d • 11010011b=-83d • range: -127 ... +127 • 11111111 ... 01111111 • there are two zeros (00000000, 10000000) • problem when counting • harder to add numbers

Two’scomplement • Take binary number, invert and add 1 • easier to do addition • simply add like unsigned int • try adding a number with its 2’s c. ! • comparison problem: negative numbers appear larger than positive • range in 8 bit: -128 .. +127 • first bit 1: negative, 0: zero or positive • only one zero, good for counting

Offset binary • add 2^(n-1) to binary number • first part of range negative, middle is zero, upper part positive • practically same as two’s compl. but first bit inverted • easier comparison

Numbers larger than data bus width • Eg. 8b system and I want to use 16b long numbers • Of course it is possible, just takes more time • Eg. addition: • first add least significant bytes (LSB), result is LSB of end result, generates Carry flag bit • add next two bytes, add carry bit to it (special add with carry instructions may be available) • and so on

Fractional numbers • Fixed point • Floating point • Traditional fraction

Fixed point • practically store as an integer (eg. when there are no floating point capabilities) • n bit: total number of digits • m: original number • f: number of two’s fractional digits

Floating point • s: sign • b: base (2) • c: significant digits • q: exponent (2’scomplementor offset)

IEEE745 (floating point standard) • first significant digit is always 1 -> no need to store, thuseg. out of 24b we need to store 23 (remaining bit can be sign) • special values: +0 , -0, +inf , -inf , NaN, subnormal • NaN (not a number): exp all 1, rest not 0 • inf (infinite): exp all 1, rest 0

IEEE745 (floating point) • Single precision: 32b (24+8) • approx.6..7 decimal digit precision • exponent format: add 127-et, thus -126..+127 (offset binary) • max: (2−2−23) × 2127 ≈ 3.402823 × 1038 • Double precision: 64b (53+11) • approx. 16 decimal digits precision

Storing multi-byte numbers • word: a number (or piece of data) made up of multiple bytes • generally two types of storage/transmission method: • little endian • LSB (least significant byte) first, it is stored at lowest address • easier hardware for addition (starts with LSB) • big endian • MSB (most significant byte) stored at lowest address

Storing multi-byte numbers • Little endian: • Intel-AMD x86, x86-64 series • Big endian: • Motorola 68000 descendants • AVR32 • IBM System/360, z/Architecture • Internet Protocol (IP, TCP, UDP) • Bi-endian (configurable): • ARM 3-tól, PowerPC, Alpha, MIPS, Itanium etc. • May find this problem in communications (eg. connect PC to a data acquisition device)

Arrays, strings • Array: realization of a matrix • Various support in hw. and sw. • String: a variable containing several characters; a character chain, a 1D array (vector) containing characters

Arrays • Hardware support: • scalar processors: access one element at a time; access by pointers • pointer hw. support: certain machine code instructions can use certain cpu registers to access RAM (register contains memory address, register is the pointer); pointer can also be in RAM • vector processor: SIMD: single instruction, multiple data; one instruction processes an array

Characters • Alphanumerical characters are stored by using a table to turn them into numbers; • eg. BCDIC,ASCII,Unicode,

ASCII • American Standard Code for Information Interchange • originally 7b coding, 8.bit could be parity if needed • thus 128 characters available • 95 printable char. (lowercase, uppercase alphabet, numbers, punctuation) • 32 control char. (for teletypes,printers) (incl. new line, carriage return, end of line, end of file, tab, del, space, etc.) • lowercase and uppercase differ in one bit only (easier conversion) • numbers contain the number in binary (last four bits)

ASCII

Understanding Instruction Set and Programming Languages for Microprocessors