420 likes | 431 Views
Explore the evolution of programming abstractions, from Von Neumann architecture to higher levels. Learn about different programming paradigms and language design goals. Understand syntax and lexical structure, regular expressions, context-free grammars, and ambiguity in programming languages.
E N D
CSE 425 Fall 2015 Midterm Exam • 80 minutes, during the usual lecture/studio time: 10:10am to 11:30am on Monday October 12, 2015 • Arrive early if you can, exam will begin promptly at 10:10am • Held in Simon Hall, Room 023 (NOT URBAUER 218) • You may want to locate the exam room in advance (downstairs in Simon Hall, near room 022) • Exam is open book, open notes, hard copy only • I will bring a copy each of the required and optional text for people to come up to the front and take a look at as needed • Please feel free to print and bring in slides, your notes, etc. • ALL ELECTRONICS MUST BE OFF DURING THE EXEM (including phones, iPads, laptops, tablets, etc.)
Abstraction in Programming LD R1 FIRST • Von Neumann Architecture • Program instructions and data are stored in a memory area • CPU executes a sequence of instructions • Machine instruction sets: lowest level of abstraction • Binary representation that the CPU can process • Or that a virtual machine can process (e.g., byte code) • Assembly language is only slightly more abstract • “Readable” labels: operations, registers, location addresses 0010 001 000000100 (opcode) (register) (location)
Evolving to Higher Levels of Abstraction • Algebraic notation and floating point numbers • E.g., Fortran (John Backus) • Structured abstractions and machine independence • E.g., ALGOL (a committee), Pascal (Niklaus Wirth) • Architecture independence (on beyond Von Neumann) • E.g., based on Lambda Calculus (Alonzo Church) • E.g., Lisp (John McCarthy)
Some Programming Paradigms • Imperative/procedural (E.g., C) • Variables, assignment, other operators • Functional (E.g., Lisp, Scheme, ML, Haskell) • Abstract notion of a function, based on lambda calculus • Logic (E.g., Prolog) • Based on symbolic logic (e.g., predicate calculus) • Object-oriented (E.g., Java, Python, C++) • Based on encapsulation of data and control together • Generic (E.g., C++ and especially its standard library) • Based on type abstraction and enforcement mechanisms • We’ll cover informally via examples throughout the semester
Language Design • Goals (potentially conflicting) • Efficiency of coding or execution, writability, expressiveness • Specific design criteria • Regularity (how well language features are integrated) • Generality (how few cases have to be handled specially) • Orthogonality (how widely features still behave the same) • Uniformity (consistent appearance/behavior across features) • Safety or “security” (how difficult it is to produce errors) • Extensibility (how easy and effective is feature addition)
Syntax and Lexical Structure • Syntax gives the structure of statements in a language • E.g., the format of tokens and how they can be arranged • Lexical structure also describes how to recognize them • Scanning obtains tokens from a stream characters • E.g., whitespace delimited vs. regular-expression based • Tokens include keywords, constants, symbols, identifiers • Usually based on assumption of taking longest substring • Parsing recognizes more complex expressions • E.g., well-formed statements in logic, arithmetic, etc. • Free-format languages ignore indentation, etc. while fixed format languages have specific restrictions/requirements
Regular Expressions, DFAs, NDFAs • Regular expressions capture lexical structure of symbols that can be built using 3 composition rules • Concatenation (ab) , selection (a | b), repetition (b*) • Finite automata can recognize regular expressions • Deterministic finite automata (DFAs) associate a unique state with each sequence generated by a regular expression • Non-deterministic finite automata (NDFAs) let multiple states to be reached by the same input sequence (adding “choice”) • Can generate a unique (minimal) DFA in 3 steps • Generate NDFA from the regular expression (Scott pp. 56) • Convert NDFA to (possibly larger) DFA (Scott pp. 56-58) • Minimize the DFA (Scott pp. 59) to get a unique automaton • C++11 <regex> library automates all this for you
Context Free Grammars and BNF • In context free grammars (CFGs), structures are independent of the other structures surrounding them • Backus-Naur form (BNF) notation describes CFGs • Symbols are either tokens or nonterminal symbols • Productions are of the form nonterminal → definition where definition defines the structure of a nonterminal • Rules may be recursive, with nonterminal symbol appearing both on left side of a production and in its own definition • Metasymbols are used to identify the parts of the production (arrow), alternative definitions of a nonterminal (vertical bar) • Next time we’ll extend metasymbols for repeated (braces) or optional (square brackets) structure in a definition (EBNF)
Ambiguity, Associativity, Precedence • If any statement in the language has more than one distinct parse tree, the language is ambiguous • Ambiguity can be removed implicitly, as inalways replacing the leftmost remaining nonterminal (an implementation hack) • Recursive production structure also can disambiguate • E.g., adding another production to the grammar to establish precedence (lower in parse tree gives higher precedence) • E.g., replacing exp → exp + exp with alternative productions exp → exp + term or exp → term + exp • Recursive productions also define associativity • I.e., left-recursive form exp → exp + term is left-associative, right-recursive form exp → term + exp is right-associative
Extended Backus-Naur Form (EBNF) • Optional/repeated structure is common in programs • E.g., whether or not there are any arguments to a function • E.g., if there are arguments, how many there are • We can extend BNF with metasymbols • E.g., square brackets indicate optional elements, as in the production function → name ‘(‘ [args] ‘)’ • E.g., curly braces to indicate zero or more repetitions of elements, as in the production args → arg {‘,’ arg} • Doesn’t change the expressive power of the grammar • A limitation of EBNF is that it obscures associativity • Better to use standard BNF to generate parse/syntax trees
Recursive-Descent Parsing • Shift-reduce (bottom-up) parsing techniques are powerful, but complex to design/implement manually • Further details about them are in another course (CSE 431) • Still will want to understand how they work, use techniques • Recursive-descent (top-down) parsing is often more straightforward, and can be used in many cases • We’ll focus on these techniques somewhat in this course • Key idea is to design (potentially recursive) parsing functions based on the productions’ right-hand sides • Then, work through a grammar from more general rules to more specific ones, consuming input tokens upon a match • EBNF helps with left recursion removal (making a loop) and left factoring (making remainder of parse function optional)
Lookahead with First and Follow Sets • Recursive descent parsing functions are easiest to write if they only have to consider the current token • I.e., the head of a stream or list of input tokens • Optional and repeated elements complicate this a bit • E.g., function → name ( [args] ) and arg → 0 |…| 9 and args → arg {, arg} with ( )0 |…| 9 , as terminal symbols • But, EBNF structure helps in handling these two cases • The set of tokens that can be first in a valid sequence, e.g., each digit in 0 |…| 9 is in the first set for arg (and for args) • The set of tokens that can follow a valid sequence of tokens, e.g., ‘)’ is in the follow set for args • A token from the first set gives a parse function permission to start, while one from the follow set directs it to end
Bindings • A binding associates a set of attributes with a name • E.g., int &i = j; // i is a reference to int j • Bindings can occur at many different times • Language design time: control flow constructs, constructors • Language implementation time: 32 vs 64 bit int, etc. • Programming time: names given to algorithms, objects, etc. • Compile time: templates are instantiated (types are bound), machine code produced for functions and methods • Link time: calls between compilation units are linked up • Load time: virtual addresses mapped to physical ones • Run time: scopes are entered and left, dynamic allocation of objects, updates to variables, pointers, and references, etc.
Static, Stack, and Dynamic Allocation • In static allocation, unchanging names are introduced • E.g., program code, global or static objects, etc. • Manifest constants can be defined at compile time as well • In stack based allocation, scoped names are added • Recursion requires a stack to maintain frames for calls • Elaboration time constants, local variables, parameters, etc. • In heap based allocation, names appear any time • I.e., whenever the program calls a dynamic allocation operator, constructor, etc. • Deallocation may be either explicit or implicit • If implicit, garbage collection is often needed
Declarations, Blocks, and Scope • A declaration explicitly binds some information to an identifier, and may bind other information implicitly • E.g, the statement int x; explicitly binds integer type attribute int with identifier x, implicitly binds a location for x, and may or may not implicitly bind a value (i.e., 0) for x • Declarations that bind all attributes are called definitions • Declarations that leave some attributes unbound are called simply declarations, or sometimes prototypes • A block is a sequence of declarations and definitions • Declarations that are specific to a block are locally scoped • Program-wide declarations are globally scoped • Block-structured languages allow both nesting of blocks and scoped re-declaration of identifiers (via lexical addressing) • Classes, structs, namespaces, packages also create scopes
Symbol Tables for Nested Scopes • Scope analysis allows declarations and bindings to be processed in a stack-like manner at run-time • A symbol table is used to keep track of that information • E.g., each identifier in a scope has a set of bindings • Structure/management may range from a single static symbol table to a dynamic hierarchy of per-scope tables pi double 3.141 main (int, char**) argc int 2 argv char ** function scope global scope
Names, Aliases, and Overloading • Multiple names may refer to same object or variable • E.g., pointers or references (and pass-by-reference) in C++ • The ability to re-use a name is also commonplace • E.g., using the + operator for addition or string concatenation • Since a symbol table usually stores parameter type attributes for a function identifier, can use to resolve • I.e., look up the declaration/definition of a function or operator based on its parameter types as well as its name • Many languages allow function name overloading and a smaller number of them allow operator overloading • Syntactically introduced constraints such as precedence and associativity of operators must be respected (unless waived) • Though with tweaks like C++ prefix operator+ call syntax
Coercion and Polymorphism • Types often are allowed to be converted to other types • E.g., while (ifs >> token) // ifstream& to bool • When the complier forces this to happen its coercion • I.e., the type conversion happens implicitly • Polymorphism lets multiple unconverted types be used • Inheritance polymorphism (E.g., C++ classes) • Interface polymorphism (E.g., C++ templates) • Both support subtype polymorphism (Liskov substitution) • Explicit parametric polymorphism is called “generic” • E.g., C++ templates with specialization
Binding of Reference Environments • Referencing environment depends on the order in which declarations are encountered • When is the context in which a function is executed fixed? • This matters because of nesting of scopes, hiding of names, and the ability to pass a function (pointer) as a parameter • Shallow binding is done just before function is called • Deep bindingis done when the parameter is first passed • Function/subroutine closures • Bundle referencing environment with reference to subroutine • E.g., local variable capture list in C++11 lambda expression • Object closures • Similar idea, bundle member variables and operator()
Operational Semantics • Augment syntax with rules for an abstract machine • Specifically, a reduction machine that reduces parts of a program (building up to the program itself) to its values • Reduction rules repeatedly infer conclusions from premises • For example, reduction of string “425” to value 425 • Parse off digit ‘4’, convert to value 4 • Parse off digit ‘2’, add value 2 to 10*4 (gives 42) • Parse off digit ‘5’, add value 5 to 10*42 (gives 425) • Axioms (rules without a premise) give basic reductions • E.g., digit and number strings to their corresponding values • Richer inference rules allow reductions to be nested • E.g., convert operands of an addition expression, then add
Environments and Assignment • If a program handles only (e.g., arithmetic) expressions, then it can be reduced to a single value • E.g., ((4+5)*(12-7) – (4*4 + 3*3)) reduces to 425 • However, if assignment statements are added to a language, an environment is needed to store state • E.g., the current values of variables at any point in the code • Environment must be added to reduction rules • E.g., if variables are used in an expression like (a+2)*c • Reductions now are made with respect to an environment • Need additional rules for effects of assignment, etc. • An identifier can reduce to its current value in environment • Assignment updates identifier’s value in the environment • Program reduces to environment after aggregate changes
Attribute Grammars and Parse Trees • Attribute grammars describe relationship between parsing and semantic attributes and their analysis • Can perform one (inline) or two passes (explicit parse tree) • Two kinds of augmentations to the grammar • Copy rules move values from terminals to non-terminals or between non-terminals, by copying the values over • Semantic functions perform richer operations like addition, subtraction, multiplication division, inverse operations, etc. • Attribute flow may matter if coupled to parse order • E.g., S-attributed grammars most general for an LR parse • E.g., L-attributed grammars most general for an LL parse • Action routines can be interleaved with parsing as well • Set up nodes of a parse tree, perform evaluation, or both
Target Machine Details • Many architectures can be similar in overall structure • E.g., Von Neumann with CISC instruction sets is common • RISC architectures also have been used and still are • For programming languages, architecture of the target machine also imposes a number of details to consider • Especially to retarget a compiler to another machine • Especially to write software that’s portable across machines • E.g., sizes, other details of memory locations and addresses • Programming language details often based on them • Sizes and layouts of data types, aliases, objects, etc. • Addresses, how incremented/decremented pointers move, relative to other variables address space (local, array, heap) • Directions stack and heap grow in memory address space
Example Data Size Rules (from C++) • Rules governing data sizes [from Stroustrup 4th Ed] • R1: 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long) • R2: 1 <= sizeof(bool) <= sizeof(long) • R3: sizeof(char) <= sizeof(wchar_t) <= sizeof(long) • R4: sizeof(float) <= sizeof(double) <= sizeof(long double) • R5: sizeof(N) == sizeof(signed N) == sizeof(unsigned N) where N is char, short, int, long, or long long (integral types) • Minimum data sizes for some types [from LLM 5th Ed] • R6: char must be at least 1 byte (8 bits) • R7: wchar_t, char16_t, short, int must be at least 2 bytes • R8: char32_t and long must be at least 4 bytes • R9: long long must be at least 8 bytes
Expressions vs. Statements • It is useful to differentiate expressions vs. statements • Statements (e.g., assignment) have side effects but usually do not return a value (C++ doesn’t follow this strictly) • Expressions (e.g., right hand side of a statement) provide a value but usually don’t have side effects (again except C++) • Expression syntax may be prefix, infix, postfix • Prefix and postfix don’t require parentheses for precedence • Comparison to procedures and functions • Operands are viewed as arguments or actual parameters • Referential transparency assumes no side effects • Side effects may change control flow as well as values • Some statements mimic complete control constructs • E.g., conditional operator in C++ acts like if/else construct
Structured vs. Unstructured Control Flow • Goto considered too low level • Still available in some languages • Rarely necessary, often better to use other features instead • E.g., break and continue statements in C++ • Many languages offer structured alternatives • E.g., break to exit a loop or a selected branch in C++ • E.g., continue to skip the rest of an iteration in C++ • Return and multi-level return • Return may set a value and also transfer control to caller • Multi-level return (or exception) may unwind farther • Continuations capture a context for further execution • E.g., to defer part of the execution until later
Sequencing • Ordering of statements • Happens-before relationship in control flow • Particularly important when thinking about concurrency • Data dependences • Previous write (store) to a variable (location) seen by subsequent read (load) instruction(s) • Compound statements • A block of statements is used in place of a single statement
Selection • If statements (e.g., in C++) • If statement evaluates expression (not necessarily Boolean) • If true (non-zero) executes statements in its first block • Otherwise executes else block if one was provided • Can next if statements, so else associates with most recent If that does not already have an else part • Switch (case) statements • Condense if/else logic into cases of an ordinal expression • Default blocks (no case matches) • Breaks, fall through can be used to emulate ranges of cases
Iteration • Implemented through loop constructs (e.g., in C++) • E.g., in C++, while, for, do loops • A continue statement skips rest of that iteration of the loop • A break statement exits the loop • Iterators also provide helpful abstractions for iteration • While loop • Most basic construct: test guards each iteration of a block • For loop • Encodes special case of a while loop (can emulate an enumeration controlled loop using logical control) • Do loop • Ensures execution of the block at least once
Recursion • Functional languages severely limit side effects • Iteration relies on side effects to make progress, terminate • Recursion is a natural alternative in those cases • Or where avoiding side effects simplifies flow control logic • Recursive functions can support lazy evaluation • E.g., packaging up remaining work as a continuation and then only performing that work if it’s needed • Normal order vs. applicative order evaluation • Operands may not be evaluated until needed (applicative)
Data and Data Types • Data may be more abstract than their representation • E.g., integer (unbounded) vs. 64-bit int (bounded) • A language or a standard may define representations • E.g., IEEE floating point standard, sizeof(char) == 1 in C++ • Otherwise representations may be platform-dependent • E.g., different int sizes on 32-bit vs. 64-bit platform in C++ • Languages also define operations on data • E.g., strong requirements for arithmetic operators in Java • E.g., weaker requirements for them in C++ • Languages also differ in when types are checked • E.g., Scheme is dynamically type checked (at run-time) • E.g., C++ is (mostly) statically checked at compile-time
Programs and Type Systems • A language’s type system has two main parts • Its type constructors • Its rules (and algorithms) for evaluating type equivalence, type compatibility, and type inference • Type systems must balance design trade-offs • Static checking can improve efficiency, catch errors earlier • Dynamic checking increases flexibility by delaying type evaluation until run-time (common in functional languages), may consider a larger subset of safe programs to be legal • Definition of data types (based on sets) • A data type has a set of values and a type name • A data type also defines operations on those values, along with properties of those operations (e.g., integer division)
Predefined Types, Simple Types • Languages often predefine types • Simple types (e.g., int and double in C++ or Java) have only a basic mathematic (arithmetic or sequential) structure • Other types with additional structure may be predefined in the language (e.g., the String type in Java) • Languages also often allow types to be introduced • Simple types can be specified within a program (e.g., an enum in C++ or a sub-range type in Ada) • Other types (e.g., unions, arrays, structs, classes, pointers) can be declared explicitly but involve type constructors
Type Equivalence • Structural equivalence • Basic structural equivalence: all types defined as AXB are the same but are all different than those defined as BXA • More complex forms arise if elements are named (so they are not interface polymorphic to member selection operator) • Name equivalence • Two types are the same only if they have the same name • Easy to check, but restrictive • Declaration equivalence • Improves on name equivalence by allowing new names to be given for a declared type, all of which are equivalent (e.g., in C++ typedefs behave this way)
Type Checking • Some languages check statically but weakly • E.g., C++ warnings rather than errors for C compatibility • Type compatibility in construction and assignment • Variable constructed, or l-value/ref assigned from an r-value • May involve conversion of r-value into a compatible type • Overloading introduces inference in type checking • Also may involve type conversion of operands • E.g., passing an int to a long int parameter • Casting overrides type checking • E.g., cannot assign long int result back into a short int in Java unless you tell the compiler explicitly to recast r-value • E.g., C++ reinterpret cast tells compiler not to check
Type Conversion • Often need to use one type in place of another type • Safe implicit conversions often are made automatically • Widening conversions usually don’t lose information • Narrowing conversions may truncate, overflow, etc. • Explicit conversions • Explicit construction from a type provides safe conversion • Casting offers alternative when safety can only be determined at run-time (by the program doing the cast) • Static casts are completed at compile-time • Dynamic casts check additional information at run-time (e.g., if polymorphic) may throw exception, return 0 pointer, etc.
Survey of Common Types I • Records • E.g., structs in C++ • If elements are named, a record is projected into its fields (e.g., via struct member selection operator as in P.x in C++) • Variant records (e.g., unions in C++) let different types overlap in memory • Arrays • Usually contiguous in memory and of a homogeneous type • Layout and access patterns also may affect cache effects, etc. • Provide an indexing operator (e.g., []) to access elements • May be allocated on stack or heap (language specific) • May be compatible with other types (e.g., pointers in C++)
Survey of Common Types II • “Utility” types (e.g., strings, sets, lists, files) • Widely used but not necessarily inherent to a language itself • Sometimes built into the language (e.g., Pascal) • E.g., set operators (+, *, -) for union, intersection, and difference • Sometimes provided by a library (e.g., the C++ STL) • Strings • Often just an array of characters (e.g., C style char *) • Often an object with operators (e.g., C++ style string) • Sets (similarly, lists) • Contain unique values (or in the case of lists, ordered sequences) of a common type • Files • Data “streams” coming from and/or going to mass storage
Survey of Common Types III • Recursive types • May contain (an alias for) another instance of the same type • E.g., a list node contains an alias for the next node in the list • May require a pointer or other similar abstraction • To halt the nesting (and/or construction) of recursive type instances • Pointers and references • Similar in some languages, different in others • E.g., pointer/references semantics in Java vs. C++ • Also may be similar to other types • E.g., (const) pointer and array semantics in C++ • E.g., address arithmetic vs. index arithmetic in C++
More about Type Constructors • Used to construct sets from other sets • E.g., as Cartesian products (tuples) • If elements are named, a record is projected into its fields (e.g., via struct member selection operator as in P.x in C++) • Specialized forms of type constructors are common • Unions often are at best unnecessary in object-oriented languages (or worse: in C++ certain examples can exhibit type problems similar to those seen with reinterpret casting) • Subset type constructors narrow a set (e.g., in Ada to construct a sub-range type from a range type) • Array constructors project an index type into another type • Pointer/reference constructors create aliases for other types (pointers have their own operators but references do not)
Type Checking with Polymorphism • Hindley-Milner type checking • Strongly typed functional languages (ML and Haskell) do this • Unambiguous (deductive) type inference, by propagating constraints and info up and down an abstract parse tree • Relies on specific forms of instantiation and unification • Different types of polymorphism • Parametric (interface) polymorphism • Explicit subtype (inheritance) polymorphism • Liskov substitution principle: whenever S is a subtype of T, wherever you need an instance of T you can use one of S • Occurs-check in unification • Makes sure variable isn’t in an expression substituted for it • Difficult to do efficiently, only use if necessary (it is for H-M)
CSE 425 Fall 2015 Midterm Exam • 80 minutes, during the usual lecture/studio time: 10:10am to 11:30am on Monday October 12, 2015 • Arrive early if you can, exam will begin promptly at 10:10am • Held in Simon Hall, Room 023 (NOT URBAUER 218) • You may want to locate the exam room in advance (downstairs in Simon Hall, near room 022) • Exam is open book, open notes, hard copy only • I will bring a copy each of the required and optional text for people to come up to the front and take a look at as needed • Please feel free to print and bring in slides, your notes, etc. • ALL ELECTRONICS MUST BE OFF DURING THE EXEM (including phones, iPads, laptops, tablets, etc.)