240 likes | 1.11k Views
Syntax and Semantics. The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter 3. What is Syntax and Semantics. Syntax and Semantics define a PL Syntax form or structure of program units
E N D
Syntax and Semantics • The Purpose of Syntax • Problem of Describing Syntax • Formal Methods of Describing Syntax • Derivations and Parse Trees • Sebesta Chapter 3
What is Syntax and Semantics • Syntax and Semantics define a PL • Syntax • form or structureofprogram units • expressions, statements, declarations, etc. • Semantics • meaningofprogram units • expressions, statements, declarations, etc. • Why do we need language definitions? • to design a language • to implementer a compiler/interpreter • to write a program (use the language)
Syntax Elements • A sentence is • a string of characters over some alphabet • A language is • a set of sentences • A lexeme is • the lowest level syntactic unit of a language • e.g.,*, public, totalCount • A token is • a category of lexemes • e.g., identifier
Describing Syntax • Recognizers • read an input string in the alphabet of the language (a sentence) and decide whether it belongs to the language • used in compilers • see Chapter 4 for details • Generators • produce sentences in a language • a sentence is syntactically correct if it can be generated by the generator
Backus-Naur Form (BNF) • BNF is a meta-language • i.e. a language used to describe another language • invented by John Backus todescribe ALGOL 58 • used by Peter Naur to describe ALGOL 60 • BNF is equivalent to context-free grammars • a BNF grammar is defined by • a set of terminal symbols, • a set of nonterminal symbols • a set of rules • a start symbol (one of the terminal symbols)
BNF Elements • terminal symbols • are the lexemes of the target PL • e.g., while, ( , ) • nonterminal symbols • represent classes of syntactic structures • they act like syntactic variables • e.g., <statement> • rules • define how a nonterminal symbol can by developed into a sequence of nonterminal and terminal symbols • e.g., <while_stmt>while(<logic_expr> )<stmt>
BNF Rules • A rule has • a left-hand side (LHS) • then • a right-hand side (RHS) • There can be several rules for one LHS <stmt> <assignment> <stmt> begin<stmt_list> end • Syntactic lists are described using recursion <ident_list> ident <ident_list> ident,<ident_list> • A grammar is • a finite nonempty set of rules
EBNF • Extended BNF (EBNF) • is most often used • avoids having numerous rules for the same LHS • Extra meta-symbols (in addition to ) • [… ] • enclosed symbols are optional (1 or 0 times) • e.g., <if_stmt> if ( <exp>) <stmt> [ else <stmt> ] • {…} • enclosed symbols can be repeated (0 to n times) • e.g., <ident_list> ident{,ident} • …|… • choice of one of the symbol sequences separated by | • e.g., <stmt> <assignment> |begin<stmt_list> end • (…) • groups enclosed symbols
BNF vs. EBNF BNF EBNF <expr> <term> {(+|- )<term> } <term> <factor> { (*|/) <factor> } <factor> <exp> [ **<factor> ] <exp> (<expr> )| id <expr> <expr> + <term> <expr> <expr> - <term> <expr> <term> <term> <term> *<factor> <term> <term> /<factor> <term> <factor> <factor> <exp> **<factor> <factor><exp> <exp>(<expr> ) <exp> id
Augmented EBNF • another meta-symbol = (equal) instead of • meta-symbols for repetitions +means one or more times *means zero or more times <ident> =<letter>+( <letter> |<digit> )* • rules can use iteration instead of recursion • e.g.: • <stmt_list> <stmt> |<stmt> ; <stmt_list> • can be formulated as • <stmt_list> =<stmt> ( ; <stmt>)*
Context-Free Grammar • Context-Free Grammars (CFG) • defined by Noam Chomsky • meant to describe the syntax of natural languages • Context-Free Grammar G = (S, T, N, P) • S = start symbol • T = set of terminal symbols – lexemes and tokens • N = set of non-terminal symbols - abstractions • P = production rules – definition of a LHS abstraction using RHS • A sentence • a sequence of terminal symbols
A Small Language in EBNF <program> begin<stmt_list> end <stmt_list> <stmt> |<stmt> ;<stmt_list> <stmt> <var> =<expr> <expr> <term> +<term> | <term> -<term> <term> <var> |const <var> a|b|c
Derivation • A derivation is • a repeated application of rules • starting with the start symbol • substitution of a nonterminal LHS by the RHS of a rule • ending with a sentence (all terminal symbols) • Every string of symbols in the derivation is • a sentential form • A sentence is • sentential form with only terminal symbols
Derivation Types • A leftmost derivation • leftmost nonterminal in each sententialform is expanded first • A rightmost derivation • rightmost nonterminal is expanded first • A mixed derivation • an arbitrary nonterminal is expanded
Derivation Example <program> begin<stmt_list> end <stmt_list> <stmt> |<stmt> ;<stmt_list> <stmt> <var> =<expr> <expr> <term> +<term> |<term> -<term> <term> <var> |const <var> a|b|c <program> => begin<stmt_list> end => begin<stmt> end => begin<var> =<expr>end => begin a = <expr> end => begin a = <term> +<term> end => begin a = <var> +<term> end => begin a = b + <term> end => begin a = b + const end
Questions In the preceding slide: • Is the derivation a leftmost or a rightmost derivation? • State the "opposite" derivation. • I.e. if it is a leftmost derivation give rightmost one • or vice versa • What are the terminal symbols of the language, what are the nonterminal symbols and what is the start symbol? • Change a rule so that begin a = - b + const end is a legal sentence
Parse Tree • Parse Tree is • a hierarchical representation of a derivation <program> begin <stmt_list> end <stmt> <var> = <expr> a <term> + <term> <var> const b
Simple Assignment Language EBNF Grammar Parse tree of the sentence: a = b * (a + c) <assign> <assign> <id> =<expr> <expr> <id>+<expr> |<id> *<expr> |(<expr> ) |<id> <id> a|b|c <id> = <expr> a <id> * <expr> b ( <expr> ) <id> + <expr> a <id> c
Ambiguous Grammars • A grammar is ambiguous • if and only if it generates asentential form that has two or more distinct parse trees • e.g. <assign> <id> =<expr> <expr> <expr> +<expr> |<expr> *<expr> | (<expr> ) |<id> <id> a|b|c
Two Distinct Parse Trees add-first parse tree multiply-first parse tree a = b + c * d a = b + c * d <assign> <assign> <id> = <id> = <expr> <expr> a a <expr> * <expr> <expr> <expr> + <expr> + <expr> <id> <id> <expr> <expr> * b <id> <id> <id> <id> d c d b c
An Unambiguous Expression Grammar • The same language can be defined with an unambiguous grammar! <assign> <id> =<expr> <expr> <expr> +<term> |<term> <term> <term>*<factor> |<factor> <factor> (<expr> ) |<id> <id> a|b|c
Precedence Through Grammar • A grammar can enforce the precedence of operators • The parse tree shows how • (low levels are evaluated first) • e.g., <expr> <expr> +<term> |<term> <term> <term> * const | const <expr> <term> <expr> + <term> <term> const * const const