200 likes | 213 Views
Understand the difference between syntax and semantics in programming languages, including the concepts of expressions, grammars, tree representations, and abstract syntax trees. Learn how to describe a programming language using various approaches, such as tutorials, reference manuals, and formal definitions.
E N D
Overview • What’s the difference between syntax and semantics? • Expressions • Grammars • Tree representations • abstract syntax trees • parse trees
How to describe a PL • Tutorials - SNOBOL is still the best example • Reference manuals - ADA • Formal definitions - to describe both syntax and semantics, which is hard • Pascal, ADA, PL/I
Syntax vs. Semantics • Syntax - what is a legal program? • Semantics - what does a (legal) program mean? Three major approaches: • axiomatic, i.e. a set of proof rules • denotational, i.e. mathematical description • operational, i.e. operations on a real or abstract machine
How to describe syntax? • By example? • Possibly ambiguous or incomplete • Used to describe shells, e.g. man pages • By use of a meta-language • Also possibly ambiguous or incomplete • But probably more precise • Possible to give some semantics in the same notation
Expressions • Prefix, postfix, or infix • Issues related to operators • arity (unary, binary, ternary, or whatever) • associativity • exponentiation is right-associative, usually • other ops are usually left-associative • precedence • follows rules from arithmetic
- 1 - 2 0 Abstract Syntax Trees • Useful for indicating how an expression is evaluated • The expression 2-0-1 is represented • Or is it?
Examples of Prefix and Postfix • Prefix • LISP operators use prefix • Postfix • Postscript operators use postfix • The simple expression 8-(7*3) is represented as: 8 7 3 mul sub • Old HP calculators did, too - no parens keys
To run LISP in emacs • Invoke emacs • M-x lisp-interaction-mode • type control-j at the end of each line • Or using an inferior emacs lisp process, • M-x ielm
(+ 2 2) 4 (sqrt 9) 3.0 (setq b 6) 6 (setq a 2) 2 (setq c 5) 5 a 2 (- b) -3 (+ (- b) (sqrt( - (* b b) (* (* 4 a) c)))) -2.5358983848622456
Prefix, Infix, Postfix • Given an abstract syntax tree, an expression can be represented in any of the three ways • Consider for example a+b*c/d • What does the abstract syntax tree look like? • What are the prefix and postfix expressions equivalent to the infix form given above?
Formal Grammars • Set of terminal symbols (or tokens) • Set of non-terminal symbols • A designated start symbol • A set of productions (or rules) that specify how symbols are to be combined to form legal strings • G=<T, N, S, R>
Context-free Grammars • There are lots of varieties of grammars • regular, context-free, context-sensitive, and unrestricted • CFGs are constrained so that exactly one non-terminal can appear on the left-side of a production • but a non-terminal may appear on the left-side of more than one production
CFG Notation • CFG productions have exactly one non-terminal on the left side, and zero or more non-terminals or terminals on the right side • Usually, nonterminals are enclosed in <anglebrackets> • Terminals (aka tokens) may be quoted for clarity
Backus-Naur Form (BNF) • BNF is a popular notation for CFGs • from a simple subset of Pascal <program> ::= <block> . <block> ::= <statement> <block> ::= begin <statements> end <statements> ::= <statement> <statements> ::= <statement>;<statements> <statement> ::= <if> | <while> | <repeat> | ... <if> ::= if <expr> then <block> <if> ::= if <expr> then <block> else <block>
BNF Operators • Sequence <A> ::= <B> c • Alternation <A> ::= <B> | <C> • Optional <A> ::= <B> [<C>] • Zero or more <A> ::= <B>* • One or more <A> ::= <B>+ • note that <B>* is a shorthand for [<B>+]
Ambiguity • There may be many (equivalent) grammars for a language. • There may be more than one way to evaluate a string with respect to a grammar • A grammar is ambiguous if, for any string in the language, that string can be parsed in more than one way.
Dangling-Else • Suppose a grammar has the production • How should we parse this statement? if E then if E2 then S1 else S2 <stmt> ::= if <expr> then <stmt> | if <expr> then <stmt> else <stmt>
A different ambiguity header ::= <header> title (link? | script?) </header> title ::= <title> text </title> link ::= <link> text </link> script ::= <script> text </script> This grammar allows the <link> and <script> constructs to appear in either order. The grammar above is then ambiguous!
header ::= <header> title (link? | script?) </header> title ::= <title> text </title> link ::= <link> text </link> script ::= <script> text </script> How do we parse this string? <header> <title> Some Title </title> </header>