400 likes | 411 Views
Explore the evolution of programming abstractions, from Von Neumann architecture to higher levels of abstraction. Learn about different programming paradigms and the syntax and lexical structure of programming languages.
E N D
CSE 425 Fall 2015 Final Exam • 120 minutes, covering material throughout semester • 10:30am to 12:30pm on Monday December 14, 2015 • Arrive early if you can, exam will begin promptly at 10:30am • Exam will be held in Lab Sciences 250 • You may want to locate the exam room in advance • Exam is open book, open notes, hard copy only • I will bring a copy each of the required and optional text for people to come up to the front and take a look at as needed • Please feel free to print and bring in slides, your notes, etc. • ALL ELECTRONICS MUST BE OFF DURING THE EXEM (including phones, iPads, laptops, tablets, etc.)
Abstraction in Programming LD R1 FIRST • Von Neumann Architecture • Program instructions and data are stored in a memory area • CPU executes a sequence of instructions • Machine instruction sets: lowest level of abstraction • Binary representation that the CPU can process • Or that a virtual machine can process (e.g., byte code) • Assembly language is only slightly more abstract • “Readable” labels: operations, registers, location addresses 0010 001 000000100 (opcode) (register) (location)
Evolving to Higher Levels of Abstraction • Algebraic notation and floating point numbers • E.g., Fortran (John Backus) • Structured abstractions and machine independence • E.g., ALGOL (a committee), Pascal (Niklaus Wirth) • Architecture independence (on beyond Von Neumann) • E.g., based on Lambda Calculus (Alonzo Church) • E.g., Lisp (John McCarthy)
Some Programming Paradigms • Imperative/procedural (E.g., C) • Variables, assignment, other operators • Functional (E.g., Lisp, Scheme, ML, Haskell) • Abstract notion of a function, based on lambda calculus • Logic (E.g., Prolog) • Based on symbolic logic (e.g., predicate calculus) • Object-oriented (E.g., Java, Python, C++) • Based on encapsulation of data and control together • Generic (E.g., C++ and especially its standard library) • Based on type abstraction and enforcement mechanisms • We’ll cover informally via examples throughout the semester
Syntax and Lexical Structure • Syntax gives the structure of statements in a language • E.g., the format of tokens and how they can be arranged • Lexical structure also describes how to recognize them • Scanning obtains tokens from a stream characters • E.g., whitespace delimited vs. regular-expression based • Tokens include keywords, constants, symbols, identifiers • Usually based on assumption of taking longest substring • Parsing recognizes more complex expressions • E.g., well-formed statements in logic, arithmetic, etc. • Free-format languages ignore indentation, etc. while fixed format languages have specific restrictions/requirements
Context Free Grammars and BNF • In context free grammars (CFGs), structures are independent of the other structures surrounding them • Backus-Naur form (BNF) notation describes CFGs • Symbols are either tokens or nonterminal symbols • Productions are of the form nonterminal → definition where definition defines the structure of a nonterminal • Rules may be recursive, with nonterminal symbol appearing both on left side of a production and in its own definition • Metasymbols are used to identify the parts of the production (arrow), alternative definitions of a nonterminal (vertical bar) • Next time we’ll extend metasymbols for repeated (braces) or optional (square brackets) structure in a definition (EBNF)
Ambiguity, Associativity, Precedence • If any statement in the language has more than one distinct parse tree, the language is ambiguous • Ambiguity can be removed implicitly, as inalways replacing the leftmost remaining nonterminal (an implementation hack) • Recursive production structure also can disambiguate • E.g., adding another production to the grammar to establish precedence (lower in parse tree gives higher precedence) • E.g., replacing exp → exp + exp with alternative productions exp → exp + term or exp → term + exp • Recursive productions also define associativity • I.e., left-recursive form exp → exp + term is left-associative, right-recursive form exp → term + exp is right-associative
Extended Backus-Naur Form (EBNF) • Optional/repeated structure is common in programs • E.g., whether or not there are any arguments to a function • E.g., if there are arguments, how many there are • We can extend BNF with metasymbols • E.g., square brackets indicate optional elements, as in the production function → name ‘(‘ [args] ‘)’ • E.g., curly braces to indicate zero or more repetitions of elements, as in the production args → arg {‘,’ arg} • Doesn’t change the expressive power of the grammar • A limitation of EBNF is that it obscures associativity • Better to use standard BNF to generate parse/syntax trees
Recursive-Descent Parsing • Shift-reduce (bottom-up) parsing techniques are powerful, but complex to design/implement manually • Further details about them are in another course (CSE 431) • Still will want to understand how they work, use techniques • Recursive-descent (top-down) parsing is often more straightforward, and can be used in many cases • We’ll focus on these techniques somewhat in this course • Key idea is to design (potentially recursive) parsing functions based on the productions’ right-hand sides • Then, work through a grammar from more general rules to more specific ones, consuming input tokens upon a match • EBNF helps with left recursion removal (making a loop) and left factoring (making remainder of parse function optional)
Lookahead with First and Follow Sets • Recursive descent parsing functions are easiest to write if they only have to consider the current token • I.e., the head of a stream or list of input tokens • Optional and repeated elements complicate this a bit • E.g., function → name ( [args] ) and arg → 0 |…| 9 and args → arg {, arg} with ( )0 |…| 9 , as terminal symbols • But, EBNF structure helps in handling these two cases • The set of tokens that can be first in a valid sequence, e.g., each digit in 0 |…| 9 is in the first set for arg (and for args) • The set of tokens that can follow a valid sequence of tokens, e.g., ‘)’is in the follow set for args • A token from the first set gives a parse function permission to start, while one from the follow set directs it to end
Bindings • A binding associates a set of attributes with a name • E.g., int &i = j; // i is a reference to int j • Bindings can occur at many different times • Language design time: control flow constructs, constructors • Language implementation time: 32 vs 64 bit int, etc. • Programming time: names given to algorithms, objects, etc. • Compile time: templates are instantiated (types are bound), machine code produced for functions and methods • Link time: calls between compilation units are linked up • Load time: virtual addresses mapped to physical ones • Run time: scopes are entered and left, dynamic allocation of objects, updates to variables, pointers, and references, etc.
Symbol Tables for Nested Scopes • Scope analysis allows declarations and bindings to be processed in a stack-like manner at run-time • A symbol table is used to keep track of that information • E.g., each identifier in a scope has a set of bindings • Structure/management may range from a single static symbol table to a dynamic hierarchy of per-scope tables pi double 3.141 main (int, char**) argc int 2 argv char ** function scope global scope
Operational Semantics • Augment syntax with rules for an abstract machine • Specifically, a reduction machine that reduces parts of a program (building up to the program itself) to its values • Reduction rules repeatedly infer conclusions from premises • For example, reduction of string “425” to value 425 • Parse off digit ‘4’, convert to value 4 • Parse off digit ‘2’, add value 2 to 10*4 (gives 42) • Parse off digit ‘5’, add value 5 to 10*42 (gives 425) • Axioms (rules without a premise) give basic reductions • E.g., digit and number strings to their corresponding values • Richer inference rules allow reductions to be nested • E.g., convert operands of an addition expression, then add
Environments and Assignment • If a program handles only (e.g., arithmetic) expressions, then it can be reduced to a single value • E.g., ((4+5)*(12-7) – (4*4 + 3*3)) reduces to 425 • However, if assignment statements are added to a language, an environment is needed to store state • E.g., the current values of variables at any point in the code • Environment must be added to reduction rules • E.g., if variables are used in an expression like (a+2)*c • Reductions now are made with respect to an environment • Need additional rules for effects of assignment, etc. • An identifier can reduce to its current value in environment • Assignment updates identifier’s value in the environment • Program reduces to environment after aggregate changes
Expressions vs. Statements • It is useful to differentiate expressions vs. statements • Statements (e.g., assignment) have side effects but usually do not return a value (C++ doesn’t follow this strictly) • Expressions (e.g., right hand side of a statement) provide a value but usually don’t have side effects (again except C++) • Expression syntax may be prefix, infix, postfix • Prefix and postfix don’t require parentheses for precedence • Comparison to procedures and functions • Operands are viewed as arguments or actual parameters • Referential transparency assumes no side effects • Side effects may change control flow as well as values • Some statements mimic complete control constructs • E.g., conditional operator in C++ acts like if/else construct
Structured vs. Unstructured Control Flow • Goto considered too low level • Still available in some languages • Rarely necessary, often better to use other features instead • E.g., break and continue statements in C++ • Many languages offer structured alternatives • E.g., break to exit a loop or a selected branch in C++ • E.g., continue to skip the rest of an iteration in C++ • Return and multi-level return • Return may set a value and also transfer control to caller • Multi-level return (or exception) may unwind farther • Continuations capture a context for further execution • E.g., to defer part of the execution until later
Selection • If statements (e.g., in C++) • If statement evaluates expression (not necessarily Boolean) • If true (non-zero) executes statements in its first block • Otherwise executes else block if one was provided • Can next if statements, so else associates with most recent If that does not already have an else part • Switch (case) statements • Condense if/else logic into cases of an ordinal expression • Default blocks (no case matches) • Breaks, fall through can be used to emulate ranges of cases
Iteration • Implemented through loop constructs (e.g., in C++) • E.g., in C++, while, for, do loops • A continue statement skips rest of that iteration of the loop • A break statement exits the loop • Iterators also provide helpful abstractions for iteration • While loop • Most basic construct: test guards each iteration of a block • For loop • Encodes special case of a while loop (can emulate an enumeration controlled loop using logical control) • Do loop • Ensures execution of the block at least once
Recursion • Functional languages severely limit side effects • Iteration relies on side effects to make progress, terminate • Recursion is a natural alternative in those cases • Or where avoiding side effects simplifies flow control logic • Recursive functions can support lazy evaluation • E.g., packaging up remaining work as a continuation and then only performing that work if it’s needed • Normal order vs. applicative order evaluation • Operands may not be evaluated until needed (applicative)
Programs and Type Systems • A language’s type system has two main parts • Its type constructors • Its rules (and algorithms) for evaluating type equivalence, type compatibility, and type inference • Type systems must balance design trade-offs • Static checking can improve efficiency, catch errors earlier • Dynamic checking increases flexibility by delaying type evaluation until run-time (common in functional languages), may consider a larger subset of safe programs to be legal • Definition of data types (based on sets) • A data type has a set of values and a type name • A data type also defines operations on those values, along with properties of those operations (e.g., integer division)
Type Equivalence • Structural equivalence • Basic structural equivalence: all types defined as AXB are the same but are all different than those defined as BXA • More complex forms arise if elements are named (so they are not interface polymorphic to member selection operator) • Name equivalence • Two types are the same only if they have the same name • Easy to check, but restrictive • Declaration equivalence • Improves on name equivalence by allowing new names to be given for a declared type, all of which are equivalent (e.g., in C++ typedefs behave this way)
Coercion and Polymorphism • Types often are allowed to be converted to other types • E.g., while (ifs >> token) // ifstream& to bool • When the complier forces this to happen its coercion • I.e., the type conversion happens implicitly • Polymorphism lets multiple unconverted types be used • Inheritance polymorphism (E.g., C++ classes) • Interface polymorphism (E.g., C++ templates) • Both support subtype polymorphism (Liskov substitution) • Explicit parametric polymorphism is called “generic” • E.g., C++ templates with specialization
Functions vs. Procedures • It is useful to differentiate functions vs. procedures • Procedures have side effects but usually do not return a value (C++ doesn’t follow this strictly) • Functions provide a value and usually don’t have side effects (again C++ doesn’t enforce this) • Procedures (or functions) abstract commonly used combinations of other procedures (or functions) • Especially if the side effect (e.g., printing out a list) or computation (e.g., factorial) is usefully expressed recursively • Whenever a procedure (or function) calls another one, data must be remembered during each such call • Stored in an activation record visible during the interval during which the procedure (or function) is executing
Exception Handling • Raising/handling exceptions similar to procedure calls • But, stack unwinds, so can’t put activation record there • Need to find/call handler dynamically • One approach is to keep a separate handler stack • Nicely general but may be expensive to maintain at runtime • May be necessary to avoid restricting handler semantics • C++ pre-computes address-indexed dispatch table • Avoids any cost to code that doesn’t use exceptions • Still somewhat expensive since it needs (e.g., binary) search • Search also motivates first-matching-catch-block semantics
Coroutines and Events • Coroutines offer an alternative to nested procedures • Take turns executing (cooperatively alternating) • Detach operation establishes ability to transfer control • Transfer operation saves program counter in a routine, transfers control to current point of execution in another • Coroutines offer a natural approach to event handling • Originated in Simula (discrete event simulation language) • Iterators can use coroutines (but often done more simply) • In general, event handling involves independent code • Handling key press vs. mouse move vs. network packet … • The idea is to abstract handlers for each distinct event and then coordinate their operations (e.g., clicking on a window brings it to the foreground and directs subsequent input to it)
Object-Oriented Programming • A design method as well as a programming paradigm • For example, CRC cards, noun-verb parsing of requirements • Hinges on inheritance-based polymorphism • Classes define behaviors and structures of objects • Subclasses may refine or extend base classes • Extensions external to classes may be supported (e.g., C#) • Distinct objects (class instances) interact • The sets of objects (and behaviors) may vary dynamically • OO paradigm introduces several key ideas • Independent objects with separate state • Encapsulate all but the most necessary details (e.g., public/private/protected access in C++) • Abstraction, polymorphism, substitution, extensibility
Implementation of OO Languages • Efficient use of instructions and program storage • E.g., a C++ object is stored as a struct of member variables and inherited variables are augmented with additional ones • Methods as functions with an extra this argument for object • Inheritance, dynamic binding raise additional issues • E.g., use of C++ virtual function table (v-tbl) to dispatch calls • Language features influence object lifetimes • E.g., C++ stack objects’ lifetimes are automatically scoped to the duration of a function call (created/destroyed with it) • Can exploit this to manage heap objects as in the common “resource allocation is initialization” (RAII) coding idiom: tie the lifetime of a heap object to that of a stack object (e.g., a smart pointer) which is in turn tied to the lifetime of a function
Programs as Functions • Some programs act like mathematical functions • Associate a set of input values from the function’s domain with a set of output values from the function’s range • E.g., written as y = f(x) or f : X → Y … … where X is the domain, Y is the range, xX is the independent variable, and yY is the dependent variable • No assignment, so no loops in purely functional code • Instead, rely on recursion to perform iteration • Referential transparency • Function’s value depends only on arguments (+ global state) • Value semantics • No local state
Expressions in Scheme • All are either special forms or function applications • Special forms begin with a Scheme keyword (e.g., car, cdr, cond, cons, define, display, if, lambda, let, letrec, quote, etc.) • Function application (call) is prefix: name then arguments • Predefined operators for many basic functions • Such as + (addition), * (multiplication), / (division), etc. • Selection expressions for if, if-else, and if-elseif logic • Use if form for single selection, vs. cond form for multiple • Binding lists (using the let or letrec keywords) • Associate values with variables before applying a function • Lambda expressions (using the lambda keyword) • Define formal parameter lists for (anonymous) functions
Data Structures in Scheme • Basic construct is a list node (box and arrow notation) • Lists are concatenations of list (or list of list …) nodes • Functions car and cdr select the head vs. the rest of a list • The cons function constructs a list from two others • Concatenates them with the first in front of the second • The null? primitive tests whether or not a list is empty • Use for recursive operations on lists (“cdr down, cons up”) L car L cdr L car cdr L cdrcdr L “hello, ” “world!”
Scheme Input and Output Functions • Numerous pre-defined input functions • E.g., (read) returns token from current input port: to use the value that was read, wrap the read in a let expression • E.g., (read P) returns token from input port P • E.g., (read-char P) returns character from inputport P • Similar for output (display won’t mark up output) • E.g., (write x Q) writes value x to ouput port Q, in quotes if a string, prefixed with #\ if a character, so read can process • E.g., (newline Q) starts next output to port Q on a new line • Functions to test state, open streams, format, etc. • E.g., (eof-object? x) tests whether value x is end-of-file • E.g., (open-input-file “in.txt”) opens “in.txt”, returns a port • E.g., call (close-output-port Q) when done with port Q
Horn Clauses • A restricted form of first-order-logic • Statements are of the form (a ∧ b∧ c) → d • Equivalent to ~a∨ ~b ∨ ~c ∨ d • d is the head of the clause • (a ∧ b∧ c) is the body of the clause • a and b and c and d are predicates of arbitrary arity (they each may take 0 or more arguments and return a Boolean result), e.g., true, isEnrolled(alice), parent(X, Y), etc. • Axioms or facts • Written in the form → d (or just d ) • Procedural interpretation of Horn Clauses • If you have the body of a clause, produce its head • Motivates use of the resolution inference rule
Unification • Given two clauses that potentially could be matched but in which variable naming/binding isn’t the same • Predicates, e.g., in425(alice) and in425(X), can be unified (matched) under substitution, e.g., alice/X (alice for X) where alice is a constant and X is a variable • Each such substitution then must be remembered for all occurrences of that variable within the horn clauses being considered (e.g., inputs and outputs to a resolution step) • Rules for transforming/matching statements • Can rename variables (variable substitution) as in X/Y • Can bind values (constant substitution) as in alice/X • Cannot modify constants like alice
Resolution • Straightforward inference rule for Horn clauses • Match existing facts to the terms in the body of a clause • If all are matched (unified), assert the head of that clause • E.g., in425(alice) and in425(X) → isEnrolled(X) allow assertion of isEnrolled(alice) • Can also use resolution to evaluate queries • Written as the body of a clause • If all of the terms in it can be matched, it is proven true (a good example of the idea of a deductive database since it then can be added to the set of available statements)
Parallelism and Concurrency • Two different terms for potential vs. actual parallelism • Actual parallelism (called parallelism) is when code can execute in physically parallel hardware • E.g., on multiple hosts, or on multiple cores of the same host • Can achieve significant speedup in program execution times • Must communicate to aggregate results, which slows things down • Logical parallelism (called concurrency) is when code appears parallel but may be interleaving on the same core • E.g., part of one code sequence runs, then part of another, etc. • Parallelism and concurrency share some key issues • E.g., asynchrony and interleaving of what happens when • May need to represent sequence and/or timing semantics • May need special handling to avoid semantic hazards
Concurrency and Synchronization Issues • Two concurrent or parallel activities may “race” to reach a common section involving a shared resource • A race condition if which activity gets there first matters • E.g., thread 1 writes AB and thread 2 writes CD (both are valid) but the writes interleave to produce CB (invalid) • Race conditions can be avoided via synchronization • E.g., each thread waits for a lock on the critical section before writing so only AB or CD can result • However, synchronization can lead to deadlock • E.g., thread 1 takes lock 1 and needs lock 2, thread 2 takes lock 2 and needs lock one (called a deadly embrace) • Protocols must be followed to avoid or break deadlock • E.g., each thread acquires lock 1 before attempting lock 2
Semaphores and Mutexes • Threads can avoid race conditions by acquiring locks • Guard access to a shared resource so they take turns with it • Dijkstra’s semaphore mechanism is one example Sem s(n); Delay(s); [critical region of code] Signal(s); • Where semaphore S gives up to n threads access at a time • Implement via a test-and-set instruction, spin-locks, etc. • A binary semaphore (a.k.a. a mutex) if n == 1 • Encodes basic common semantics for mutual exclusion • Can allow optimized implementation (e.g., Linux futexes avoid system calls unless there is contention for the lock) • Can implement either one using the other • Update a counter within mutex-guarded method • Initialize a semaphore with a count of 1
Deadlocks and other Issues • Synchronization may cause deadlocks • Cyclic dependence, mutual exclusion lead to deadlock • Even if a deadlock has not occurred yet, code may reach a path on which deadlock becomes unavoidable • Protocols/mechanisms to avoid/detect/break deadlock • E.g., via Dijkstra’s Banker’s algorithm, timed locking, etc. • Fairness/liveness of lock access scheduling matters • Order in which threads are given access to a lock may vary • Accidental complexity also matters • E.g., user’s ability to mis-configure locking and concurrency • Motivates alternate uses of mutexes and/or semaphores • Encapsulating locks within type-safe object model may help
(Passive) Monitor Objects • “Monitor Object” Pattern Approach • Methods run in callers’ threads • Condition variables arbitrate use of a common shared lock • E.g., using a std::mutex, a std::unique_lock (must be able to unlock and re-lock it) and a std::condition_variable • Ensures incremental progress while avoiding race conditions • Threads wait on condition • Condition variable performs thread-safe lock/wait and wake/unlock operations • Thread released when it can proceed • E.g., when queue isn’t empty/full • Blocks caller until request can be handled, coordinates callers Client Proxy List (Monitor Object) add() Condition lookup() Lock
CSE 425 Fall 2015 Final Exam • 120 minutes, covering material throughout semester • 10:30am to 12:30pm on Monday December 14, 2015 • Arrive early if you can, exam will begin promptly at 10:30am • Exam will be held in Lab Sciences 250 • You may want to locate the exam room in advance • Exam is open book, open notes, hard copy only • I will bring a copy each of the required and optional text for people to come up to the front and take a look at as needed • Please feel free to print and bring in slides, your notes, etc. • ALL ELECTRONICS MUST BE OFF DURING THE EXEM (including phones, iPads, laptops, tablets, etc.)