530 likes | 634 Views
Syntax of All Sorts. COS 441 Princeton University Fall 2004. Acknowledgements. Many slides from this lecture have been adapted from John Mitchell’s CS-242 course at Stanford Good source of supplemental information http://theory.stanford.edu/people/jcm/books/cpl-teaching.html
E N D
Syntax of All Sorts COS 441 Princeton University Fall 2004
Acknowledgements Many slides from this lecture have been adapted from John Mitchell’s CS-242 course at Stanford • Good source of supplemental information http://theory.stanford.edu/people/jcm/books/cpl-teaching.html • Differs in foundational approach when compared to Harper’s approach will discuss this difference in later lectures
Interpreter vs Compiler Source Program Input Interpreter Output Source Program Compiler Target Program Input Output
Interpreter vs Compiler • The difference is actually a bit more fuzzy • Some “interpreters” compile to native code • SML/NJ runs native machine code! • Does fancy optimizations too • Some “compilers” compile to byte-code which is interpreted • javac and a JVM which may also compile
Interactive vs Batch • SML/NJ is a native code compiler with an interactive interface • javac is a batch compiler for bytecode which may be interpreted or compiled to native code by a JVM • Python compiles to bytecode then interprets the bytecode it has both batch and interactive interfaces • Terms are historical and misleading • Best to be precise and verbose
Typical Compiler Source Program Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Target Program
Brief Look at Syntax • Grammar e ::= n | e + e | e £ e n ::= d | nd d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • Expressions in language e e £ e e £ e + e n £ n + n nd £ d + d dd £ d + d … 27 £ 4 + 3 Grammar defines a language Expressions in language derived by sequence of productions
Parse Tree • Derivation represented by tree e e £ e e £ e + e n £ n + n nd £ d + d dd £ d + d … 27 £4 + 3 e e e £ e e + 27 4 3 Tree shows parenthesization of expression
Dealing with Parsing Ambiguity • Ambiguity • Expression 27 £ 4 + 3 can be parsed two ways • Problem: 27 £ (4 + 3) (27 £ 4) + 3 • Ways to resolve ambiguity • Precedence • Group £ before + • Parse 3 £ 4 + 2 as (3 £ 4) + 2 • Associativity • Parenthesize operators of equal precedence to left (or right) • Parse 3 + 4 + 5 as (3 + 4) + 5
Rewriting the Grammar • Harper describe a way to rewrite the grammar to avoid ambiguity • Eliminating ambiguity in a parsing grammar is a dark art • Not always even possible • Depends on the parsing technique you use • Learn all you want to know and more in a compiler book or course • Ambiguity is bad when trying to think formally about programming languages
Abstract Syntax Trees • We want to reason about expressions inductively • So we need an inductive description of expressions • We could use the parse tree • But it has all this silly term and factor stuff to parsing unambiguous • Introduce nicer tree after the parsing phase
Syntax Comparisons Ambiguous Concrete Abstract Unambiguous Concrete
times-e plus-e num-e E1exp E2exp E1exp E2exp N 2N times(E1,E2)exp plus(E1,E2)exp num[N] exp Induction Over Syntax • We can think of the CFG notation as defining a predicate exp • Thinking about things this way lets us use rule induction on abstract syntax
T1termTnterm(OP) = n OP(T1,…,Tn) term A More General Pattern • One might ask what is the general pattern of predicates defined by CFG that represent abstract syntax • represents a signature that assigns arities to operators such a num[N], plus, and times • Note the T1 … Tn are all unique
S Z Xnat succ(X)nat zeronat An Example
S Z Xnat succ(X)nat zeronat An Example
S Z Xnat succ(X)nat zeronat An Example
S-O Z-E S-E Xeven Xodd succ(X)odd zeroeven succ(X)even Another Example
S-O Z-E S-E Xeven Xodd succ(X)odd zeroeven succ(X)even Another Example
S-O Z-E S-E Xeven Xodd succ(X)odd zeroeven succ(X)even Another Example
A(X,Y,Z) eq(X,X)equal A(X,succ(Y),succ(Z)) F(succ(N),X) F(N,Y) A(X,Y,Z) F(succ(succ(N)),Z) NOT Abstract Syntax
Abstract Syntax To ML datatype nat = zero | succ of nat
Abstract Syntax To ML datatype nat = zero | succ of nat datatype exp = num of nat | plus of (exp * exp) | times of (exp * exp)
Abstract Syntax To ML datatype nat = Zero | Succ of nat datatype exp = Num of int | Plus of (exp * exp) | Times of (exp * exp) A small cheat for performance Capitalize by convention
Abstract Syntax To ML • Converting things is not always straightforward • Constructor names need to be distinct • ML lacks some features that would make some things easier • Operator overloading • Interacts badly with type-inference • Subtyping would improve things too • We can get by with predicates sometimes instead
Abstract Syntax To ML (cont.) datatype nat = Zero | Succ of nat
Abstract Syntax To ML (cont.) datatype nat = Zero | Succ of nat datatype even = EvenZero | EvenSucc of (odd) and odd = OddSucc of even
Abstract Syntax To ML (cont.) datatype nat = Zero | Succ of nat fun even(Zero) =true |even(Succ(n))=odd(n) and odd(Zero) =false |odd(Succ(n)) =even(n)
Syntax Trees Are Not Enough • Programming languages include binding constructs • variables function arguments, variable function declarations, type declarations • When reasoning about binding constructs we want to ignore irrelevant details • e.g. the actual “spelling” of the variable • we only that variables are the same or different • Still we must be clear about the scope of a variable definition
Abstract Binding Trees • Harper uses abstract binding trees (Chapter 5) • This solution is based on very recent work M. J. Gabbay and A.M. Pitts, A New Approach to Abstract Syntax with Variable Binding, Formal Aspects of Computing 13(2002)341-363 • The solution is new but the problems are old! Alonzo Church. A set of postulates for the foundation of logic. The Annals of Mathematics, Second Series, Volume 33, Issue 2 (Apr. 1932), 346-366. • We will talk about the problem with binding today and deal with the solutions in the next lecture
Lambda Calculus • Formal system with three parts • Notation for function expressions • Proof system for equations • Calculation rules called reduction • Basic syntactic notions • Free and bound variables • Illustrates some ideas about scope of binding • Symbolic evaluation useful for discussing programs
History • Original intention • Formal theory of substitution • More successful for computable functions • Substitution ! symbolic computation • Church/Turing thesis • Influenced design of Lisp, ML, other languages • Important part of CS history and theory
Expressions and Functions • Expressions x + y x + 2 £ y + z • Functions x. (x + y) z. (x + 2 £ y + z) • Application (x. (x + y)) 3 = 3 + y (z. (x + 2 £ y + z)) 5 = x + 2 £ y + 5 Parsing: x. f (f x) = x.( f (f (x)) )
Free and Bound Variables • Bound variable is “placeholder” • Variable x is bound in x. (x+y) • Function x. (x+y) is same function as z. (z+y) • Compare x+y dx = z+y dz x P(x) = z P(z) • Name of free (=unbound) variable does matter • Variable y is free in x. (x+y) • Function x. (x+y) is not same as x. (x+z) • Occurrences • y is free and bound in x. ((y. y+2) x) + y
Reduction • Basic computation rule is -reduction (x. e1) e2 [xÃe2]e1 where substitution involves renaming as needed • Example: (f. x. f (f x)) (y. y+x)
Rename Bound Variables • Rename bound variables to avoid conflicts (f. z. f (f z)) (y. y+x) … z. (z+x)+x • Substitute “blindly” (f. x. f (f x)) (y. y+x) … x. (x+x)+x
Reduction (f. x. f (f x)) (y. y+x)
Reduction (f. z. f (f z)) (y. y+x) Rename bound “x” to “z” to avoid conflict with free “x”
Reduction (f. z. f (f z)) (y. y+x) [fÃ(y. y+x)](z. f (f z))
Reduction (f. z. f (f z)) (y. y+x) z. (y. y+x) ((y. y+x) z))
Reduction (f. z. f (f z)) (y. y+x) z. (y. y+x) ((y. y+x) z)) z. (y. y+x) ([yÃz](y+x))
Reduction (f. z. f (f z)) (y. y+x) z. (y. y+x) ((y. y+x) z)) z. (y. y+x) (z+x)
Reduction (f. z. f (f z)) (y. y+x) z. (y. y+x) ((y. y+x) z)) z. (y. y+x) (z+x) z. [yÃ(z+x)](y+x)
Reduction (f. z. f (f z)) (y. y+x) z. (y. y+x) ((y. y+x) z)) z. (y. y+x) (z+x) z. ((z+x)+x)
Incorrect Reduction (f. x. f (f x)) (y. y+x)
Incorrect Reduction (f. x. f (f x)) (y. y+x) [fÃ(y. y+x)](x. f (f x))
Incorrect Reduction (f. x. f (f x)) (y. y+x) x. (y. y+x) ((y. y+x) x))
Incorrect Reduction (f. x. f (f x)) (y. y+x) x. (y. y+x) ((y. y+x) x)) x. (y. y+x) ([yÃx](y+x))
Incorrect Reduction (f. x. f (f x)) (y. y+x) x. (y. y+x) ((y. y+x) x)) x. (y. y+x) (x+x)