1 / 53

Syntax of All Sorts

Syntax of All Sorts. COS 441 Princeton University Fall 2004. Acknowledgements. Many slides from this lecture have been adapted from John Mitchell’s CS-242 course at Stanford Good source of supplemental information http://theory.stanford.edu/people/jcm/books/cpl-teaching.html

abbott
Download Presentation

Syntax of All Sorts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Syntax of All Sorts COS 441 Princeton University Fall 2004

  2. Acknowledgements Many slides from this lecture have been adapted from John Mitchell’s CS-242 course at Stanford • Good source of supplemental information http://theory.stanford.edu/people/jcm/books/cpl-teaching.html • Differs in foundational approach when compared to Harper’s approach will discuss this difference in later lectures

  3. Interpreter vs Compiler Source Program Input Interpreter Output Source Program Compiler Target Program Input Output

  4. Interpreter vs Compiler • The difference is actually a bit more fuzzy • Some “interpreters” compile to native code • SML/NJ runs native machine code! • Does fancy optimizations too • Some “compilers” compile to byte-code which is interpreted • javac and a JVM which may also compile

  5. Interactive vs Batch • SML/NJ is a native code compiler with an interactive interface • javac is a batch compiler for bytecode which may be interpreted or compiled to native code by a JVM • Python compiles to bytecode then interprets the bytecode it has both batch and interactive interfaces • Terms are historical and misleading • Best to be precise and verbose

  6. Typical Compiler Source Program Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Target Program

  7. Brief Look at Syntax • Grammar e ::= n | e + e | e £ e n ::= d | nd d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • Expressions in language e  e £ e  e £ e + e  n £ n + n  nd £ d + d  dd £ d + d  …  27 £ 4 + 3 Grammar defines a language Expressions in language derived by sequence of productions

  8. Parse Tree • Derivation represented by tree e  e £ e  e £ e + e  n £ n + n  nd £ d + d  dd £ d + d  …  27 £4 + 3 e e e £ e e + 27 4 3 Tree shows parenthesization of expression

  9. Dealing with Parsing Ambiguity • Ambiguity • Expression 27 £ 4 + 3 can be parsed two ways • Problem: 27 £ (4 + 3)  (27 £ 4) + 3 • Ways to resolve ambiguity • Precedence • Group £ before + • Parse 3 £ 4 + 2 as (3 £ 4) + 2 • Associativity • Parenthesize operators of equal precedence to left (or right) • Parse 3 + 4 + 5 as (3 + 4) + 5

  10. Rewriting the Grammar • Harper describe a way to rewrite the grammar to avoid ambiguity • Eliminating ambiguity in a parsing grammar is a dark art • Not always even possible • Depends on the parsing technique you use • Learn all you want to know and more in a compiler book or course • Ambiguity is bad when trying to think formally about programming languages

  11. Abstract Syntax Trees • We want to reason about expressions inductively • So we need an inductive description of expressions • We could use the parse tree • But it has all this silly term and factor stuff to parsing unambiguous • Introduce nicer tree after the parsing phase

  12. Syntax Comparisons Ambiguous Concrete Abstract Unambiguous Concrete

  13. times-e plus-e num-e E1exp E2exp E1exp E2exp N 2N times(E1,E2)exp plus(E1,E2)exp num[N] exp Induction Over Syntax • We can think of the CFG notation as defining a predicate exp • Thinking about things this way lets us use rule induction on abstract syntax

  14. T1termTnterm(OP) = n OP(T1,…,Tn) term A More General Pattern • One might ask what is the general pattern of predicates defined by CFG that represent abstract syntax •  represents a signature that assigns arities to operators such a num[N], plus, and times • Note the T1 … Tn are all unique

  15. S Z Xnat succ(X)nat zeronat An Example

  16. S Z Xnat succ(X)nat zeronat An Example

  17. S Z Xnat succ(X)nat zeronat An Example

  18. S-O Z-E S-E Xeven Xodd succ(X)odd zeroeven succ(X)even Another Example

  19. S-O Z-E S-E Xeven Xodd succ(X)odd zeroeven succ(X)even Another Example

  20. S-O Z-E S-E Xeven Xodd succ(X)odd zeroeven succ(X)even Another Example

  21. A(X,Y,Z) eq(X,X)equal A(X,succ(Y),succ(Z)) F(succ(N),X) F(N,Y) A(X,Y,Z) F(succ(succ(N)),Z) NOT Abstract Syntax

  22. Abstract Syntax To ML

  23. Abstract Syntax To ML datatype nat = zero | succ of nat

  24. Abstract Syntax To ML datatype nat = zero | succ of nat datatype exp = num of nat | plus of (exp * exp) | times of (exp * exp)

  25. Abstract Syntax To ML datatype nat = Zero | Succ of nat datatype exp = Num of int | Plus of (exp * exp) | Times of (exp * exp) A small cheat for performance Capitalize by convention

  26. Abstract Syntax To ML • Converting things is not always straightforward • Constructor names need to be distinct • ML lacks some features that would make some things easier • Operator overloading • Interacts badly with type-inference • Subtyping would improve things too • We can get by with predicates sometimes instead

  27. Abstract Syntax To ML (cont.) datatype nat = Zero | Succ of nat

  28. Abstract Syntax To ML (cont.) datatype nat = Zero | Succ of nat datatype even = EvenZero | EvenSucc of (odd) and odd = OddSucc of even

  29. Abstract Syntax To ML (cont.) datatype nat = Zero | Succ of nat fun even(Zero) =true |even(Succ(n))=odd(n) and odd(Zero) =false |odd(Succ(n)) =even(n)

  30. Syntax Trees Are Not Enough • Programming languages include binding constructs • variables function arguments, variable function declarations, type declarations • When reasoning about binding constructs we want to ignore irrelevant details • e.g. the actual “spelling” of the variable • we only that variables are the same or different • Still we must be clear about the scope of a variable definition

  31. Abstract Binding Trees • Harper uses abstract binding trees (Chapter 5) • This solution is based on very recent work M. J. Gabbay and A.M. Pitts, A New Approach to Abstract Syntax with Variable Binding, Formal Aspects of Computing 13(2002)341-363 • The solution is new but the problems are old! Alonzo Church. A set of postulates for the foundation of logic. The Annals of Mathematics, Second Series, Volume 33, Issue 2 (Apr. 1932), 346-366. • We will talk about the problem with binding today and deal with the solutions in the next lecture

  32. Lambda Calculus • Formal system with three parts • Notation for function expressions • Proof system for equations • Calculation rules called reduction • Basic syntactic notions • Free and bound variables • Illustrates some ideas about scope of binding • Symbolic evaluation useful for discussing programs

  33. History • Original intention • Formal theory of substitution • More successful for computable functions • Substitution ! symbolic computation • Church/Turing thesis • Influenced design of Lisp, ML, other languages • Important part of CS history and theory

  34. Expressions and Functions • Expressions x + y x + 2 £ y + z • Functions x. (x + y) z. (x + 2 £ y + z) • Application (x. (x + y)) 3 = 3 + y (z. (x + 2 £ y + z)) 5 = x + 2 £ y + 5 Parsing: x. f (f x) = x.( f (f (x)) )

  35. Free and Bound Variables • Bound variable is “placeholder” • Variable x is bound in x. (x+y) • Function x. (x+y) is same function as z. (z+y) • Compare  x+y dx =  z+y dz x P(x) = z P(z) • Name of free (=unbound) variable does matter • Variable y is free in x. (x+y) • Function x. (x+y) is not same as x. (x+z) • Occurrences • y is free and bound in x. ((y. y+2) x) + y

  36. Reduction • Basic computation rule is -reduction (x. e1) e2  [xÃe2]e1 where substitution involves renaming as needed • Example: (f. x. f (f x)) (y. y+x)

  37. Rename Bound Variables • Rename bound variables to avoid conflicts (f. z. f (f z)) (y. y+x)  …  z. (z+x)+x • Substitute “blindly” (f. x. f (f x)) (y. y+x)  …  x. (x+x)+x

  38. Reduction (f. x. f (f x)) (y. y+x)

  39. Reduction (f. z. f (f z)) (y. y+x) Rename bound “x” to “z” to avoid conflict with free “x”

  40. Reduction (f. z. f (f z)) (y. y+x)  [fÃ(y. y+x)](z. f (f z))

  41. Reduction (f. z. f (f z)) (y. y+x)  z. (y. y+x) ((y. y+x) z))

  42. Reduction (f. z. f (f z)) (y. y+x)  z. (y. y+x) ((y. y+x) z))  z. (y. y+x) ([yÃz](y+x))

  43. Reduction (f. z. f (f z)) (y. y+x)  z. (y. y+x) ((y. y+x) z))  z. (y. y+x) (z+x)

  44. Reduction (f. z. f (f z)) (y. y+x)  z. (y. y+x) ((y. y+x) z))  z. (y. y+x) (z+x)  z. [yÃ(z+x)](y+x)

  45. Reduction (f. z. f (f z)) (y. y+x)  z. (y. y+x) ((y. y+x) z))  z. (y. y+x) (z+x)  z. ((z+x)+x)

  46. Incorrect Reduction (f. x. f (f x)) (y. y+x)

  47. Incorrect Reduction (f. x. f (f x)) (y. y+x)  [fÃ(y. y+x)](x. f (f x))

  48. Incorrect Reduction (f. x. f (f x)) (y. y+x)  x. (y. y+x) ((y. y+x) x))

  49. Incorrect Reduction (f. x. f (f x)) (y. y+x)  x. (y. y+x) ((y. y+x) x))  x. (y. y+x) ([yÃx](y+x))

  50. Incorrect Reduction (f. x. f (f x)) (y. y+x)  x. (y. y+x) ((y. y+x) x))  x. (y. y+x) (x+x)

More Related