CS 3304 Comparative Languages

CS 3304Comparative Languages • Lecture 9:Control Flow • 14 February 2012

Introduction • Control flow (or ordering) in program execution. • Eight principal categories of language mechanisms used to specify ordering: • Sequencing. • Selection. • Iteration. • Procedural abstraction. • Recursion. • Concurrency. • Exception handling and speculation. • Nondeterminacy. • The relative importance of different categories of control flow varies significantly among the different classes of programming languages.

Expression Evaluation • Expression: • A simple object: e.g., a literal constant, a named variable or constant. • An operator or function applied to a collection of operands or arguments, each of which in turn is an expression. • Function calls: a function name followed by a parenthesized, comma-separated list of arguments. • Operator: built-in function that uses special, simple syntax – one or two arguments, no parenthesis or commas. • Sometimes they are “syntactic sugar” for more “normal” looking functions (in C++ a+b is shprt for a.operator+(b)) • Operand: an argument of an operator.

Function Call Notation • Where the function name appears? What are notation type? • Prefix: before its arguments: op a bor op(a,b) or (op a b). • Infix: Among its arguments: a op b. • Postfix: After its arguments: a b op. • Most imperative languages use infix notation for binary and prefix for unary operators • Lisp uses prefix notation (Cambridge Polish):(op a b). • ML and the R scripting language allow the user to create new infix operators. • Smalltalk uses infix notation for all functions (multiword infix):myBoxdisplayOn: myScreen at: 100@50 • Postscript and Forth use postfix notations for most of its functions: other examples include C post-increment and decrement operators.

Precedence and Associativity • Infix notation requires the use of parenthesis to avoid ambiguity. • The choice among alternative evaluation orders depends on the precedence and associativity of the operators: • C has very rich precedence structure: problems with remembering all the precedence levels (15 levels). • Pascal has relatively flat precedence hierarchy (3 levels). • APL and Smalltalk: all operators are of equal precedence. • Associativity rules specify whether sequences of operators of equal precedence group to the right or to the left: • Usually the operators associate left-to-right. • Fortran: the exponentiation operator ** associates right-to-left. • C: the assignment operator associates right-to-left.

Precedence in Fortran, C, Ada, Pascal

Assignments • Functional language: expressions are the building blocks: • lf computation is expression evaluation that depends only on the referencing environment for that evaluation. • Expressions in a purely functional language are referentially transparent: the value depends only on the referencing environment. • Imperative language: computation is usually and ordered series of changes to the values of variables in memory. Assignments provide the principal means for these changes. • Side effect: a programming construct influences subsequent computation in any way other than by returning a value for use in the surrounding context. • Expressions: always produce value and may have side effects. • Statements: executed solely for the side effects. • Imperative programming: computing by means of side effects.

References and Values • Subtle but important differences in the semantics of assignment in different imperative languages. • Based on the context, a variable may refer to the value of the variable (r-value) or its location (l-value) – a named container for a value. • Value model of variables: an expression can be either an l-value or an r-value, based on the context in which it appears. • Built-in types can’t be passed uniformly to methods expecting class type parameters: wrapper classes, automatic boxing/unboxing. • Reference model of variables: a variable is a named reference for a value – every variable is an l-value. • E.g., integer values (like 2) are immutable. • A variable has to be dereferenced to obtainits value. a 4 a 4 b 2 b 2 c 2 c

Orthogonality • Orthogonality: features can be used in any combination, the combinations all make sense, and the meaning of a given feature is consistent. • Algol 68: orthogonalitywas a principal design goal. • Expression-oriented - no separate notion of statement:begin a := if b < c then d else e; a := begin f(b); g(c) end; g(d); 2 + 3end • Pascal: everything is a statement. • C distinguishes between statements and expressions but has expression statement. • Assignment with an expression: problems in C since it uses = for the assignment operator.

Combination Assignment Operators • Imperative languages frequently update variables and can use statements like a = a + 1; that result in redundant address calculations. • If the address calculation has a side effect, It has to be rewritten using additional statement(s). • Starting with Algol 68, many languages provide assignment operators to update variables, e.g., a += 1;. • C provides 10 different assignment operators, one for each of it binary arithmetic and bit-wise operators. • Additionally, prefix and postfix increment and decrement operators. • Multiway assignment- tuples: • a,b = c,dmeans a = c; b = d; • a,b = b,a;swapping variable values. • a,b,c = foo(d,e,f);functions return tuples and single values.

Initialization • Imperative languages do not always initialize the values of variables in declarations - three reasons why they should: • A static variable local to a subroutine. • Statically allocated variable: initialization at compile time. • Prevents accidental use of uninitialized variables. • In addition to built-in types, to provide an orthogonal approach, aggregates (built-up structured values of user-defined composite types) are needed (C, Ada, ML). • A language can provide a default value. • Use of an uninitialized variable as a dynamic semantic error. • Run-time detection could be expensive. • Definite assignment: no use of uninitialized variables. • Every possible control path assigns a value. • Constructors: initialization versus assignment.

Ordering with Expressions • Precedence and associativity not sufficient: • Operand evaluation order. • Subroutine arguments evaluation order. • Why is the evaluation order important? • Side effects: an operand that is a function can modify other operands. • Code improvement: impact on register allocation and instruction scheduling. Most languages leave the evaluation order undefined. • Java represents a shift away from performance as the overriding design goal. • Some implementations: the compiler can rearrange the expressions with commutative/associative/distributive operators to generate faster code. • Problem: limited precision of computer arithmetic, arithmetic overflow.

Short-Circuit Evaluation • Short-circuit evaluation of Boolean expressions: skipping the rest of the computation if the value can be determined: • (a > b) or (b > c): if a is greater than b, the value of the Boolean expression is true regardless of the values of b and c. • Can save a significant amount of time in some situations. • It changes the semantics of Boolean expressions. • Possible problems with side effects. • Some languages provide both regular and short-circuit Boolean operators (Ada). • Can be considered an example of lazy evaluation.

Structured and Unstructured Flow • Assembly language: conditional and unconditional branches. • Early Fortran: relied heavily on goto statements (and labels): IF (A .LT. B) GOTO 10 …10 • Late 1960s: Abandoning of GOTO statements started. • Move to structured programming in 1970s: • Top-down design (progressive refinement). • Modularization of code. • Descriptive variable. • Within a subroutine, a well-designed imperative algorithm can be expressed with only sequencing, selection, and iteration. • Most of the structured control-flow constructs were introduced by Algol 60.

Structured Alternatives to goto • With the structured constructs available, there was a small number of special cases where goto was replaced by special constructs: return, break, continue. • Multilevel returns: branching outside the current subroutine. • Unwinding: the repair operation that restores the run-time stack of subroutine information, including the restoration of register contents. • Errors and other exceptions within nested subroutines: • Auxiliary Boolean variable. • Nonlocal GOTOs. • Multilevel returns. • Exception handling. • Continuations: a generalization of nonlocal gotos that unwind the stack – fundamental to denotational semantics.

Sequencing • The principal means of controlling the order in which side effects occur. • Compound statement: a delimited list of statements. • Block: a compound statement optionally preceded by a set of declarations. • The value of a list of statements: • The value of its final element (Algol 68). • Programmers choice (Common Lisp – not purely functional). • Can have side effects; very imperative, von Neumann. • There are situations where side effects in functions are desirable: random number generators. • Euclid and Turing: functions are not permitted to have side effects.

Selection • Selection statement: mostly some variant of if…then…else. • Languages differ in the details of the syntax. • Short-circuited conditions: • The Boolean expression is not used to compute a value but to cause control to branch to various locations. • Provides a way to generate efficient (jump) code. • Parse tree: inherited attributes of the root inform it of the address to which control should branch:if ((A > B) and (C > D)) or (E ≠ F) then r1 := A r2 := Bthen_clause if r1 <= r2 goto L4else r1 := C r2 := Delse_clause if r1 > r2 goto L1 L4: r1 := E r2 := F if r1 = r2 goto L2 L1: then_clause goto L3 L2: else_clause L3:

Case/Switch Statements • Alternative syntax for a special case of nested if..then..else.CASE … (* expression *) 1: clause_A| 2, 7: clause_B| 3..5: clause_C| 10: clause_D ELSE clause_EEND • Code fragments (clauses): the arms of the CASE statement. • The list of constants are CASE statement labels: • The constants must be disjoint. • The constants must of a type compatible with the tested expression. • The principal motivation is to facilitate the generation of efficient target code: meant to compute the address in which to jump in a single instruction. • A jump table: a table of addresses.

Alternative Implementations • A non-dense set of labels results in a very large jump table. • Other approaches include: • Sequential testing: number n of case statement labels is small, O(n). • Hashing: the range of label values is large but many missing values and no large ranges, O(1). • Binary search: large value ranges, O(log n). • Compilers needs to use a variety of strategies. • Syntactic details vary from language to language. • Pascal: no default clause. • Modula: optional else clause. • C: ranges not allowed, fall-through provision. • Case statements are one of the clearest examples of language design driven by implementation: generation of jump tables.

Iteration • Iteration: a mechanism that allows a computer to perform similar operations repeatedly. • Favored in imperative languages. • Mostly some form of loops executed for their side effects: • Enumeration-controlled loops: executed once of every value in a given finite set. • Logically controlled loops: executed until some Boolean condition changes value. • Combination loops: combines the properties of enumeration-controlled and logically controlled loops (Algol 60). • Iterators: executed over the elements of a well-defined set (often called containers or collections in object-oriented code).

Enumeration-Controlled Loops • Originated with the DO loop in Fortran I. • Adopted in almost every language but with varying syntax and semantics. • Many modern languages allow iteration over much more general finite sets. • Semantic complications: • Can control enter or leave the loop in any way other than through the enumeration mechanism? • What happens if the loop body modifies variables that were used to compute the end-of-loop bound? • What happens if the loop body modifies the index variable itself? • Can the program read the index variable after the loop has completed, and if so, what will its value be? • Solution: the loop header contains a declaration of the index.

Combination Loops • Algol 60: can specify an arbitrary number of “enumerators” – a single value, a range of values, or an expression. • Common Lisp: four separate sets of clauses – initialize index variables, test for loop termination, evaluate body expressions, and cleanup at loop termination. • C: semantically, for loop is logically controlled but makes enumeration easy - it is the programmer’s responsibility to test the terminating condition. • The index and any variables in the terminating condition can be modified within the loop. • All the code affecting the flow of control is localized within the header. • The index can be made local by declaring it within the loop thus it is not visible outside the loop.

Iterators • True iterators: a container abstraction provides an iterator that enumerates its items (Clu, Python, Ruby, C#). • An iterator is a separate thread of control, with its own program counter, whose execution is interleaved with that of the loop.for i in range(first, last, step): • Iterator objects: iteration involves both a special from of a for loop and a mechanisms to enumerate the values for the loop: • Java: an object that supports Iterable interface – includes an iterator() method that returns an Iterator object.for (iterator<Integer> it = myTree.iterator(); it.hasNext();) { Integer i = it.next(); System.out.println(i);} • C++: overloading operators so that iterating over the elements is like using pointer arithmetic.

Iterating • First-class functions: functional languages support “in line” function specification. • The body of a loop is written as a function, the loop index as an argument – the function is then passed to an iterator. • Scheme: a lambda expression. • Smalltalk: a square-bracketed block. • Iterating without iterators: languages without true iterators and iterator objects – use programming conventions, e.g., define a type and associated functions (C). • The syntax of the loop is not elegant and, probably, more prone to accidental errors. • The code for the iterator is simply a type and some associated functions

Logically Controlled Loops • The only issue: where within the body of the loop the termination condition is tested. • Before each iteration: the familiar while loop syntax – using an explicit concluding keyword or bracket the body with delimiters. • Post-test loops: test the terminating condition at the bottom of a loop – the body is always executed at least once. • Midtest loops: often accomplished with a special statement nested inside a conditional – break (C), exit (Ada), or last (Perl).

Recursion • Recursion requires no special syntax: why? • Recursion and iteration are equally powerful. • Most languages provide both iteration (more “imperative”) and recursion (more “functional”). • Tail-recursive function: additional computation never follows a recursive call. The compiler can reuse the space, i.e., no need for dynamic allocation of stack space.int gcd(int a, int b) { if (a == b) return a; else if (a > b) return gcd(a - b,b); else return gcd(a, b – a);} • Sometimes simple transformations are sufficient to produce tai-recursive code: continuation-passing style.

Evaluation Order • It is possible to pass unevaluated arguments to subroutines and evaluate them when needed. • Applicative-order evaluation: evaluation before the subroutine call – cleare and more efficient. • Normal-order evaluation: evaluation only when the value is needed – occurs in macros, short-circuit Boolean evaluation, call-by-name parameters, and some functional languages. • Example: Algol 60 uses normal-order evaluation by default for user-defined functions to mimic the behavior of macros.

Lazy Evaluation • In the absence of side effects the same semantics as normal order evaluation. • Scheme provides delay and force for optional normal-order evaluation. The implementation keeps track of evaluated expressions and reuse the values if needed again. • Promise: a delayed expression. • Memoization: the mechanism used to keep track of which promises have already been evaluated. • Often used to create infinite or lazy data structures that are “fleshed out” on demand:(define naturals (letrec ((next (lambda (n) (cons n (delay (next (+n 1))))))) (next 1)))(define head car)(define tail (lambda (stream) (force (cdr stream))))

Nondeterminacy • Dijkstra suggested the use of nondeterminacy for selection and logically controlled loops (guarded commands):if condition -> stmt list do condition -> stmt list[] condition -> stmt list [] condition -> stmt list[] condition -> stmt list [] condition -> stmt list… …fi fi • Guard: each of the conditions in these constructs. • A nondeterministic choice is made among the guards that evaluate to true, and the statement list following the chosen guard is executed. • Nondeterminacy in concurrent programs can affect correctness. • Ideally, what we should like in a nondeterministic construct is a guarantee of fairness.

Summary • Distinction between l-values and r-values, as well as the value model and the reference model of variables. • Sequencing and iteration are fundamental to imperative programming. • Recursion is fundamental to functional programming. • The evolution of constructs is driven by ease of programming, semantic elegance, ease of implementation, and run-time efficiency. • Improvements in language semantics is worth a small cost in run-time efficiency (e.g., iterators). • Programming conventions can help in older, comparatively primitive languages.

CS 3304 Comparative Languages