620 likes | 811 Views
Chapter 3. Describing Syntax and Semantics. Chapter 3 Topics. Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs: Dynamic Semantics. Introduction.
E N D
Chapter 3 Describing Syntax and Semantics
Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs: Dynamic Semantics
Introduction • Language description includes two main components • Syntax • The form of expressions, statements, and program units • Semantics • The meaning of expressions, statements, and program units • Syntax and semantics provide a language’s definition
Introduction • Describing syntax is easier than describing semantics • Universally-accepted notations can be used to describe syntax. • No universally-accepted systems have been created for describing semantics.
The General Problem of Describing Syntax: Terminology Definition: A sentence is an ordered string of characters over some alphabet Definition: A language is a set of sentences Definition: A lexeme is the lowest level syntactic unit of a language (e.g., *, sum,begin) Definition: A token is a category of lexemes (e.g., identifier)
Formal Grammars andFormal Languages A formal grammar G = (N,Σ,P,S) is a quad-tuple such that N is a finite set of nonterminal symbols Σis a finite set of terminal symbols, disjoint from N P is a finite set of production rules of the formαNβ → γ S ЄN the start symbol The language of a formal grammar G, denoted as L(G), is the set of all strings over Σ that can be generated by starting with the start symbol S and then applying the production rules in P until no nonterminal symbols are present.
Formal Methodsof Describing Syntax • The most widely known methods for describing programming language syntax: • Backus-Naur Form (BNF) • Context-Free Grammars • Extended BNF (EBNF) • Improves readability and writability • Grammars and Recognizers
Context-Free Grammars • Developed by Noam Chomsky • mid-1950s • Language generators • meant to describe the syntax of natural languages • Defines a class of languages called context-free languages
Formal Definition of Languages Recognizers A device that reads input strings of the language and decides whether the input strings belong to the language Generators A device that generates sentences of a language which are used to compare with the syntax of a particular sentence
Backus-Naur Form (BNF) • Invented by John Backus and Peter Naur • Equivalent to context-free grammars • A metalanguage used to describe another language • Abstractions are used to represent classes of syntactic structures • act like syntactic variables • called nonterminal symbols
BNF Fundamentals • Non-terminals: abstractions • Terminals: lexemes and tokens • Grammar: a collection of rules • Examples of BNF rules: <id_list> -> ident | ident, <id_list> <if_stmt> -> if <logic_expr> then <stmt>
BNF Rules • A rule has • a left-hand side (LHS) • a right-hand side (RHS) • consists of terminal and nonterminal symbols • A grammar is a finite nonempty set of rules • An abstraction (or nonterminal symbol) can have more than one RHS <stmt> -> <single_stmt> | begin <stmt_list> end
Describing Lists • Syntactic lists are described using recursion <id_list> -> ident | ident, <id_list> • A derivation is • a repeated application of rules, • starting with the start symbol, and • ending with a sentence • all terminal symbols
An Example Grammar <program> -> <stmts> <stmts> -> <stmt> | <stmt> ; <stmts> <stmt> -> <var> = <expr> <var> -> a | b | c | d <expr> -><term> + <term> |<term> - <term> <term> -> <var> | const
An Example Derivation <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
Derivation • Every string of symbols in the derivation is a sentential form • A sentence is a sentential form that has only terminal symbols • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded • A derivation may be neither leftmost nor rightmost
Parse Tree A hierarchical representation of a derivation <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b
Ambiguity in Grammars A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees
An Ambiguous Expression Grammar <expr> <expr> <op> <expr> | const <op> / | - <expr> <expr> <expr> <op> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const
An Unambiguous Expression Grammar If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity <expr> <expr> - <term> | <term> <term> <term> / const | const <expr> <expr> - <term> <term> <term> / const const const
Associativity of Operators Operator associativity can also be indicated by a grammar. ambiguous: <expr> -> <expr> + <expr> | const unambiguous: <expr> -> <expr> + const | const <expr> <expr> <expr> + const <expr> + const const
Extended BNF Optional parts are placed in brackets [] <proc_call> -> ident [(<expr_list>)] Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term>(+|-) const Repetitions (0 or more) are placed inside braces {} <ident> → letter {letter|digit}
BNF and EBNF BNF <expr> -><expr> + <term> | <expr> - <term> | <term> <term> -><term> * <factor> | <term> / <factor> | <factor> EBNF <expr> -><term> {(+|-) <term> } <term> -><factor> {(*|/) <factor> }
Semantics • The meaning, not the form • Need a language to describe the semantics of languages • Assorted mathematical formalisms • Static Semantics • Attribute Grammars • Dynamic Semantics • Operational Semantics • Axiomatic Semantics • Denotational Semantics
Static Semantics • Static semantics of a language are those non-syntactic rules which • 1) define legality of a program • 2) can be analyzed at compile time • Consider a rule which states “a variable must be declared before it is referenced” • Cannot be specified in a context-free grammar • Can be tested at compile time • Some rules can be specified in a grammar, but unnecessarily complicate the grammar
Attribute Grammars • Describes more than can be described with a context-free grammar (Knuth, 1968) • Uses three additional elements to CFGs to carry some semantic information along parse trees • Attributes • value holders which describe language symbols • Attribute computation functions • describe how attribute values are determined • also called semantic functions • Predicate functions • describe language rules, associated with grammar rules
Attribute Grammars • Most attribute grammar work uses the parse trees: • Inherited attributes pass information down a parse tree • Synthesized attributes pass information up a parse tree • Intrinsic attributes are synthesized attributes for parse tree leaves
Attribute Grammars Definition: An attribute grammar is a context-free grammar G = (N,Σ,P,S) with the following additions: • For each grammar symbol x there is a set A(x) of attribute values • Each rule has a set of functions that define certain attributes of the nonterminals in the rule • Each rule has a (possibly empty) set of predicates to check for attribute consistency
Attribute GrammarsDefinitions • Let X0 X1 ... Xn be a rule • Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define synthesized attributes • Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for i <= j <= n, define inherited attributes • Initially, there are intrinsic attributeson the leaves
Attribute GrammarExample • Syntax <expr> -> <var> + <var> | <var> <var> -> A|B|C • Attributes • actual_type synthesized for <var>and<expr> • expected_type inherited for <expr>
Attribute GrammarExample 1. Syntax rule: <expr> <var>[1] + <var>[2] Semantic rules: <expr>.actual_type <var>[1].actual_type Predicate: <var>[1].actual_type == <var>[2].actual_type <expr>.expected_type == <expr>.actual_type 2. Syntax rule: <var> id Semantic rule: <var>.actual_type lookup (<var>.string)
Attribute Grammars How are attribute values computed? If all attributes were inherited, the tree could be decorated in top-down order. If all attributes were synthesized, the tree could be decorated in bottom-up order. In many cases, both kinds of attributes are used, and it is some combination of top-down and bottom-up that must be used.
Attribute Grammars <expr>.expected_type inherited from parent <var>[1].actual_type lookup (A) <var>[2].actual_type lookup (B) <var>[1].actual_type =? <var>[2].actual_type <expr>.actual_type <var>[1].actual_type <expr>.actual_type =? <expr>.expected_type
Attribute Grammars Primary value of AGs: 1. Static semantics specification 2. Static semantics checking
Dynamic Semantics • There is no single widely acceptable notation or formalism for describing semantics • Operational Semantics • Describe the meaning of a program by executing its statements on a machine, either simulated or actual. • The change in the state of the machine (memory, registers, etc.) defines the meaning of each statement.
Dynamic Semantics • The dynamic semantics of a program is the meaning of its expressions, statements, and program units • Accurately describing semantics is essential • Semantics are often described in English: thus, ambiguity! • Three different methods are commonly used to describe semantics formally
Dynamic Semantics Operational Semantics Axiomatic Semantics Denotational Semantics
Operational Semantics Describe the meaning of a program by executing its statements on a machine, either simulated or actual. The change in the state of the machine (memory, registers, etc.) defines the meaning of the statement
Operational Semantics • To use operational semantics for a high-level language, a virtual machine is needed • A hardwarepure interpreter would be too expensive • A software pure interpreter also has problems
Operational Semantics • A better alternative: A complete computer simulation • The process: • Build a translator (translates source code to the machine code of an idealized computer) • Build a simulator for the idealized computer
Uses of Operational Semantics • compiler writing • rigorously proving properties of the program • program modeling for model checking • the “gold standard”for validating other semantics
Axiomatic Semantics • Based on formal logic (predicate calculus) • Original purpose: formal program verification • Define axioms or inference rules for each statement type in the language • allowa transformations of expressions to other expressions • The expressions are called assertions
Axiomatic Semantics • An assertion before a statement is a precondition • states the relationships and constraints among variables that are true at that point in execution • An assertion following a statement is a postcondition • The weakest precondition • least restrictive precondition that guarantees the postcondition • Pre-post form: {P} statement {Q}
Axiomatic Semantics • Program proof process • The postcondition for the whole program is the desired results • Work back through the program to the first statement • If the precondition on the first statement is the same as the program spec, then the program is correct
Axiomatic Semantics An axiom for assignment statements (x = E): {Qx->E} x = E {Q} The Rule of Consequence: {P} S {Q}, P' => P, Q => Q' {P’} S {Q’}
Axiomatic Semantics An inference rule for statement sequences: For a sequence S1;S2: {P1} S1 {P2} {P2} S2 {P3} the inference rule is: {P1} S1 {P2}, {P2} S2 {P3} {P1} S1; S2 {P3}
Axiomatic Semantics An inference rule for logical pretest loops: For the loop construct {P} while B do S end {Q} the inference rule is {I and B} S {I} {I} while B do S {I and (not B)} where I is the loop invariant
Characteristics of the Loop Invariant It must meet the following conditions: 1. P => I 2. {I} B {I} 3. {I and B} S {I} 4. (I and (not B)) => Q 5. The loop terminates
Loop Invariant The loop invariant I is a weakened version of the loop postcondition, and it is also a precondition. I must be weak enough to be satisfied prior to the beginning of the loop, I must be strong enough (combined with the loop exit condition) to force the truth of the postcondition