200 likes | 219 Views
This chapter provides an overview of syntax and semantics in programming languages, covering expressions, grammars, tree representations, and abstract syntax trees. It explores different approaches to describing programming languages and compares syntax and semantics. It also delves into context-free grammars, BNF notation, and formal grammars. Additionally, it discusses the representation of expressions using prefix, infix, and postfix notations, as well as the concept of ambiguity in grammars.
E N D
Overview • What’s the difference between syntax and semantics? • Expressions • Grammars • Tree representations • abstract syntax trees • parse trees
How to describe a PL • Tutorials - SNOBOL is still the best example • Reference manuals - ADA • Formal definitions - to describe both syntax and semantics, which is hard • Pascal, ADA, PL/I
Syntax vs. Semantics • Syntax - what is a legal program? • Semantics - what does a (legal) program mean? Three major approaches: • axiomatic, i.e. a set of proof rules • denotational, i.e. mathematical description • operational, i.e. operations on a real or abstract machine
How to describe syntax? • By example? • Possibly ambiguous or incomplete • Used to describe shells, e.g. man pages • By use of a meta-language • Also possibly ambiguous or incomplete • But probably more precise • Possible to give some semantics in the same notation
Context-free Grammars • There are lots of varieties of grammars • regular, context-free, context-sensitive, and unrestricted • CFGs are constrained so that exactly one non-terminal can appear on the left-side of a production • but a non-terminal may appear on the left-side of more than one production
CFG Notation • CFG productions have exactly one non-terminal on the left side, and zero or more non-terminals or terminals on the right side • Usually, nonterminals are enclosed in <anglebrackets> • Terminals (aka tokens) may be quoted for clarity
Backus-Naur Form (BNF) • BNF is a popular notation for CFGs • from a simple subset of Pascal <program> ::= <block> . <block> ::= <statement> <block> ::= begin <statements> end <statements> ::= <statement> <statements> ::= <statement>;<statements> <statement> ::= <if> | <while> | <repeat> | ... <if> ::= if <expr> then <block> <if> ::= if <expr> then <block> else <block>
BNF Operators • Sequence <A> ::= <B> c • Alternation <A> ::= <B> | <C> • Optional <A> ::= <B> [<C>] • Zero or more <A> ::= <B>* • One or more <A> ::= <B>+ • note that <B>* is a shorthand for [<B>+]
Formal Grammars • Set of terminal symbols (or tokens) • Set of non-terminal symbols • A designated start symbol • A set of productions (or rules) that specify how symbols are to be combined to form legal strings • G=<T, N, S, R>
Expressions • Prefix, postfix, or infix • Issues related to operators • arity (unary, binary, ternary, or whatever) • associativity • exponentiation is right-associative, usually • other ops are usually left-associative • precedence • follows rules from arithmetic
- 1 - 2 0 Abstract Syntax Trees • Useful for indicating how an expression is evaluated • The expression 2-0-1 is represented • Or is it?
Examples of Prefix and Postfix • Prefix • LISP operators use prefix • Postfix • Postscript operators use postfix • The simple expression 8-(7*3) is represented as: 8 7 3 mul sub • Old HP calculators did, too - no parens keys
To run LISP in emacs • Invoke emacs • M-x lisp-interaction-mode • type control-j at the end of each line • Or using an inferior emacs lisp process, • M-x ielm
(+ 2 2) 4 (sqrt 9) 3.0 (setq b 6) 6 (setq a 2) 2 (setq c 5) 5 a 2 (- b) -3 (+ (- b) (sqrt( - (* b b) (* (* 4 a) c)))) -2.5358983848622456
Prefix, Infix, Postfix • Given an abstract syntax tree, an expression can be represented in any of the three ways • Consider for example a+b*c/d • What does the abstract syntax tree look like? • What are the prefix and postfix expressions equivalent to the infix form given above?
Ambiguity • There may be many (equivalent) grammars for a language. • There may be more than one way to evaluate a string with respect to a grammar • A grammar is ambiguous if, for any string in the language, that string can be parsed in more than one way.
Dangling-Else • Suppose a grammar has the production • How should we parse this statement? if E then if E2 then S1 else S2 <stmt> ::= if <expr> then <stmt> | if <expr> then <stmt> else <stmt>
A different ambiguity header ::= <header> title (link? | script?) </header> title ::= <title> text </title> link ::= <link> text </link> script ::= <script> text </script> This grammar allows the <link> and <script> constructs to appear in either order. The grammar above is then ambiguous!
header ::= <header> title (link? | script?) </header> title ::= <title> text </title> link ::= <link> text </link> script ::= <script> text </script> How do we parse this string? <header> <title> Some Title </title> </header>