F28PL1 Programming Languages

F28PL1 Programming Languages Lecture 1: Introduction

Computer • memory machine • Input/Output communicates information between memory and outside world • CPU manipulates information from memory • CPU interprets information from memory as data or instructions CPU information memory information Input/Output

Memory bits 7 6 5 4 3 2 1 0 • physical/electronic • sequences of 8 bit bytes • each byte has a unique address • specify address on address bus to get/put byte value to/from CPU on data bus • address is a sequence of bytes address byte value address bus memory data bus addr byte

Memory • byte = 8 bits can only represent 0 – 28-1 = 255 • must construct more substantial information representations • from sequences of bytes • treated as if a single bit sequence • e.g. 2 bytes = 16 bits = 0 – 216-1 = 64kB • e.g. 4 bytes = 32 bits = 0 – 232-1 = 2GB • e.g. 8 bytes = 64 bits = 0 – 264-1 = 16EB

CPU address bus data bus • physical/electronic • runs machine code instructions to change information in memory • instructions manipulate information in registers • high speed • small number • arithmetic & logic unit • carries out operations registers addr byte ALU ALU

CPU • processes information in 1, 2, 4 or 8 byte chunks • also 16 bytes = 128 bits in SIMD mode • interprets byte sequences as: • data in continuous bit sequences • instructions with distinct fields

Instruction • operation field: what is to be done • operand fields: what is to be manipulated • absolute value • register number • address of value in memory • fields may be multi-byte • number of operand fields may vary operation operand 1 operand N

Instructions • byte sequences in memory • CPU fetches instructions from memory • program counter • register • holds memory address for next instruction • automatically incremented after each instruction • branch instruction changes instruction sequence

Instructions • instructions manipulate registers • e.g. load register with value from memory address • e.g. add value to register • e.g. store value from register at memory address • instructions can also change program counter • to change order of execution • e.g. branch to new address • e.g. branch to new address depending on values in other registers

Real software is bytes! • ultimately, all software boils down to machine code • physical byte sequences in memory • being interpreted by the physical CPU as instructions • to change other physical byte sequences in memory

Machine code • as fast as it gets! • very hard to • write • read • debug • CPU specific • not portable between different CPUs • now very rarely constructed by hand

Programming languages • abstract away from machine code • introduce ways of describing : • memory as variables • i.e. name/value associations • byte sequences as typed data • e.g. int/float/char, array, string, struct, object • instruction sequences as control constructs • e.g. assignment, arithmetic, logic, if, while, block, method/function/subroutine/procedure

Programming languages • greater expressivity • easier to write/read/debug programs • loss of direct manipulation of CPU/memory • must translate abstractions to machine code • language processors • e.g. assemblers & compilers • code bloat • compared with hand written machine code, generated machine code may be: • bigger • less efficient

Assembly language • textual memory description • address  label • bit sequence  constant • textual instruction fields • operations  mnemonics • operands • value  constants • address  label • register  mnemonic

Assembly language • e.g. to add values from 1 to 10 COUNT: 10 -- memory[COUNT] = 27 SUM: 0 -- memory[SUM] = 0 LOOP: LOAD R1,COUNT –- R1 <- memory[COUNT] LOAD R2, SUM -- R2 <- memory[SUM] ADD R2,R1 -- R2 <- R2+R1 DECR R1 –- R1 <- R1-1 JNZ R1,LOOP –- if R1!=0 then goto LOOP STORE R2, SUM -- memory[SUM] <- R2 NB invented assembly language...

Assembly language • one to one correspondence with machine code • no loss of efficiency • assembly language is CPU specific • no universal assembly language • not portable between CPUs • translated to machine code by assembler program • how to write very first assembler...? • improvement but very limited abstractions • hard to read/write/debug • specialist skill • e.g. device drivers, BIOS, small embedded systems

Autocodes • early high level languages - 1950s • introduced: • variables • type values • arithmetic/logic/conditions/iteration/subroutine • arrays • CPU specific • not portable • e.g. Ferranti Atlas Autocode

General purpose high level languages • late 1950s/early 1960s • CPU independent • underlying platform not exposed • big variations in standardisation especially for I/O • manufacturers might introduce system specific features/libraries • very influential

FORTRAN • FORmulaTRANslation • IBM 1957 • scientific computation • integers & reals • poor character handling • F/High Performance Fortran still widely used on multi-processor systems

COBOL • COmmon Business Oriented Language • US Navy/Codasyl 1958 • commercial data processing • “English-like” • text oriented • majority use language in late 20th century • lots of legacy code in financial sector • e.g. Y2K bug required Cobol programmers to come out of retirement

ALGOL 60 • ALGOrithmic Language • IFIP/academic 1958-60 • general purpose • first language with defining document • ALGOL 60 Report • introduced Backus-Naur Form (BNF) for syntax • long out of use • highly influential parent of C & Java

PL/1 • IBM late 1960s • general purpose • first language with full formal specification • Vienna Definition Language • only available on IBM systems • no longer used

Language paradigms • many programming languages • more than natural languages...? • paradigm is a family of languages with common semantic features

System Level Languages • high-level + CPU level features • e.g. registers, addresses • CPU specific • e.g. PL360 – IBM 360 • e.g. PL/M + PL/Z – Intel 8080/Zilog Z80 • operating systems • long out of use

System Level Languages • CPU independent • e.g. BCPL – University of Cambridge • portable compilers • network controllers • no types – everything is bytes • now little used • e.g. C – AT&T • Unix systems language • widely used in Linux, Apple OSX, Microsoft Windows • C++ - OO C • C# - Microsoft

Object oriented • high level + objects • e.g. C++ - academic • widely used for industrial/control/real-time software • e.g. Java – Sun • widely used for commercial software • especially open source • e.g. Open Office, Mozilla, Android

Declarative • based on formal/mathematical theory • “no assignment” • functional languages • λcalculus/recursive function theory • e.g. Standard ML/Haskell • logic languages • predicate calculus • e.g. Prolog

Scripting • configure system services • high level + system API • e.g. sh, Perl, Python, PHP • “gluewear” to bolt together system components • e.g. GUI + database • e.g. Internet services

Language processors • CPUs execute machine code directly • any other languages must be processed to be executed • compiler • translates to another language • interpreter • executes programs as if language is machine code

Compiler • translates source language to target language • target language • originally assembly language • increasingly a system language (e.g. C) which is then compiled to machine code via assembler • CPU specific machine code known as native code • very fast • compilation may lose source level program information so hard to trace/debug • compiler is a program written in some language • how to write very first compiler...?

Interpreter • abstract machine for a language • behaves as if the language is its machine code • e.g. Java Virtual Machine (JVM) in Java Development Kit • javac generates JVM code from Java • javainterprete JVM code • e.g. many scripting languages

Interpreter • easier to retain source level information • provides good debugging facilities • interpreted code many orders slower than machine code on CPU • useful for small systems or prototyping • interpreter is a program written in some language • how to write the very first interpreter...?

Language definition • defines what legal programs look like and how they should behave • provide common standard for: • programmers • ensure correct use of language constructs • support program debugging • enable reasoning about programs • implementors • ensure all implementations are equivalent

Lexicon • symbols of the language • like words in a natural language • constructed from character sequences • treated as unitary entities • e.g. constants, identifiers, punctuation, reserved words

Syntax • well formed symbol sequences • grammar rules • e.g. expressions, statements/commands, blocks

Static semantics • relationships amongst constructs in well formed symbol sequences • for static checking before execution • e.g. expression types consistent • e.g. variable declared before use • e.g. variable assigned before value used

Dynamic semantics • run-time behaviour of well formed symbol sequences • informs how to: • translate well formed symbol sequences into target language or • interpret well formed symbol sequences • supports reasoning about programs • e.g. specification & proof of correctness

Compiler stages • lexical analysis • checks source language character sequences and produces symbols • syntax analysis • checks symbol sequences and builds internal representation • abstract syntax tree (AST) • semantic analysis • traverses internal representation checking dependencies between program constructs • usually distinguish type checking as separate phase

Compiler stages • code generation • traverses internal representation and produces target language code • optimisation • rearranges/simplifies target code to remove redundancies

Interpreter stages • lexical analysis • syntax analysis • (optional) semantic analysis • interpretation • traverse internal representation carrying out operations specified by program constructs • repeated dependency checking if no semantic analysis

And so... • in this course we will study: • programming language concepts • assembly language programming for the ARM processor • system level programming in C • functional programming in Standard ML • logic programming in Prolog

F28PL1 Programming Languages