350 likes | 656 Views
Syntax. Outline. Programming Language Specification Lexical Structure of PLs Syntactic Structure of PLs Context-Free Grammar / BNF Parse Trees Abstract Syntax Trees Ambiguous Grammar Associativity and Precedence EBNFs and Syntax Diagrams. Programming Language Specification.
E N D
Outline • Programming Language Specification • Lexical Structure of PLs • Syntactic Structure of PLs • Context-Free Grammar / BNF • Parse Trees • Abstract Syntax Trees • Ambiguous Grammar • Associativity and Precedence • EBNFs and Syntax Diagrams
Programming Language Specification • PLs require precise definitions (i.e. no ambiguity) • Language form (Syntax) • Language meaning (Semantics) • Consequently, PLs are specified using formal notation: • Formal syntax • Tokens • Grammar • Formal semantics • Operational • Denotational • Axiomatic
Lexical Structure of PLs (cont.) • Main task of scanner: identify tokens • Basic building blocks of programs • E.g. keywords, identifiers, numbers, punctuation marks • Lexeme – an instance of a token. • One can think of programs as strings of lexemes rather than of characters • A token of a language is a category of its lexemes (or instances) • Some tokens can have one or more lexemes • E.g. keyword, identifier, number • In some cases, a token has only one single possible lexeme • E.g.equal_sign, plus_op, mult_op
Lexical Structure of PLs (cont.) • Consider the following Java statement: index = 2 * count + 17 ; • The lexemes and tokens of this statement are:
Lexical Structure of PLs (cont.) • Tokens in a programming language are described formally by regular expressions. • Regular expressions – descriptions of patterns of characters • Regular expression operations • Basic operations • Concatenation item sequencing • Choice or selection | • Repetition * • Grouping ( ) • Additional operations • One or more repetitions + • Range of characters [ - ] • Optional ? • Any character .
Lexical Structure of PLs (cont.) • Regular expression examples • (a|b)*c • String that match include ababaac, aac, bbc, c, and babc • [0-9]+ • Integer constants with one or more digits • [0-9]+(\.[0-9]+)? • Floating-point literals • [a-zA-Z][a-zA-Z0-9_]* • Identifiers
Lexical Structure of PLs (cont.) • Scanners generators: • lex, flex • ANTLR – Another Tool for Language Recognition • These programs can be used to generate a program (i.e., a scanner) that can extract tokens from a stream of characters. • Many PLs provide good support for regular expressions – Java, C#, Perl, Ruby, … • Support for regular expressions in Java • java.util.regex package • split() method of String class
Syntactic Structure of PLs • Specifying the formof a programming language • Tokens • Regular Expression • Syntax – organization of tokens • Context-Free Grammars (CFGs)
Context-Free Grammar • Context-free grammars (CFGs) are used to describe the syntax of PLs. • Proposed by Noam Chomsky – a noted linguist • BNF (Backus-Naur Form) is a notation for describing syntax. • Proposed by John Backus and Peter Naur • CFG and BNF are nearly identical and are used interchangeably. • BNF is a metalanguage for programming languages. • A metalanguage is a language that is used to describe another language.
Context-Free Grammar (cont.) • CFG or BNF consists of a series of rules or productions. • Productions are made up of: • Nonterminals – structures that are broken down into further structures • Terminals – things that cannot be broken down • Metasymbols • Symbols that are part of CFG/BNF • These are not actual symbols in the language being described • Sometimes, a metasymbol is also an actual symbol in a language • One of the nonterminals is designated as the start symbol. • The start symbol stands for the entire structure being defined.
Context-Free Grammar (cont.) • CFG/BNF Example (Figure 4.2, page 83) (1) sentence→noun-phrase verb-phrase . (2) noun-phrase→articlenoun (3) article →a | the (4) noun →girl | dog (5) verb-phrase→verb noun-phrase (6) verb→sees | pets
Context-Free Grammar (cont.) • The language of a CFG is the set of strings of terminals that can be generated from the start symbol by a derivation: sentencenoun-phraseverb-phrase . (rule 1) articlenounverb-phrase . (rule 2) thenounverb-phrase . (rule 3) thegirlverb-phrase . (rule 4) thegirlverbnoun-phrase . (rule 5) thegirlseesnoun-phrase . (rule 6) the girl seesarticlenoun . (rule 2) the girl sees anoun . (rule 3) the girl sees a dog . (rule 4)
Context-Free Grammar (cont.) • Derivation – Generating sentences of the language through a sequence of applications of rules (or productions), beginning with a special nonterminal called the start symbol. • Leftmost derivation – The replaced nonterminal is always the leftmost nonterminal. • Rightmost derivation – The replaced nonterminal is always the rightmost nonterminal. • A derivation may be neither leftmost nor rightmost. Derivation order has no effect on the language generated by a grammar.
Context-Free Grammar (cont.) • A grammar for a small language <program> → begin <stmt_list> end <stmt_list> → <stmt> | <stmt> ; <stmt_list> <stmt> → <var> := <expr> <expr> → <var> + <var> | <var> - <var> | <var> <var> → A | B | C • Derive the following program: begin A := B + C ; B := C end • Is the language defined by this grammar finite or infinite?
Context-Free Grammar (cont.) • Left recursive rule – A BNF rule is left recursive if the left-hand side (LHS) appears at the beginning of its right-hand side (RHS). • Right recursive rule – A BNF rule is right recursive if the LHS appears at the right end of the RHS. • Examples: number®number digit |digit digit ®0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 expr®expr+expr |exprexpr | (expr ) | number • Uses of recursion in BNF: • to show repetition • to describe complex structures
Parse Trees • A parse tree is a graphical representation of hierarchical syntactic structure of sentences. It describes graphically the replacement process in a derivation. • A parse tree is labeled by nonterminals at interior nodes and terminals at leaves. • A parse tree better expresses the structure inherent in a derivation.
Parse Trees (cont.) Problem 1: <assign> → <id> := <expr> <expr> → <id> + <expr> | <id> * <expr> | ( <expr> ) | <id> <id> → A | B | C Show a leftmost derivation and a parse tree for each of the following statements: A := A + ( B * C ) A := B + C + A A := A * ( B + C ) A := B * ( C * ( A + B ) )
Parse Trees (cont.) Problem 2: Describe, in English, the language defined by the following grammar: <S> → <A> <B> <C> <A> → a <A> | a <B> → b <B> | b <C> → c <C> | c Problem 3: Consider the following grammar: <S> → <A> a <B> b <A> → <A> b | b <B> → a <B> | a Which of the following sentences are in the language generated by this grammar? baab bbbab bbaaaaa bbaab
Parse Trees (cont.) Problem 4: Consider the following grammar: <S> → a <S> c <B> <S> → <A> | b <A> → c <A> | c <B> → d | <A> Which of the following sentences are in the language generated by the grammar? abcd acccbd acccbcc acd accc
Abstract Syntax Trees • Parse trees are still too detailed in their structure, since every step in a derivation is expressed as nodes • Abstract Syntax Tree or (just syntax tree) shows the essential structure of a parse tree. • AST is more compact than the corresponding parse tree • An (abstract) syntax tree condenses a parse tree to its essential structure • Language designers and translator writers are most interested in abstract syntax. • A programmer is most interested in concrete syntax • Examples on the next two slides…
Abstract Syntax Trees (cont.) Parse Tree Corresponding AST
Abstract Syntax Trees (cont.) Parse Tree Corresponding AST
Ambiguous Grammars • A grammar is ambiguous if it is possible to construct two or more distinct parse trees for the same string • Example: • Grammar: expr®expr+expr |exprexpr | (expr ) | NUMBER • Expression: 2 + 3 * 4 • Parse trees – ambiguity in operator precedence
Ambiguous Grammars (cont.) • Another Example: • Grammar: expr®expr+expr |exprexpr | (expr ) | NUMBER • Expression: 2 - 3 - 4 • Parse trees – ambiguity in operator associativity
Ambiguous Grammars (cont.) • Ways to resolve ambiguities in a grammar • Revise grammar – desired approach • Provide disambiguating rule (semantic help) • Revising grammar to address precedence and associativity ambiguities • Do not write rules that allow a parse tree to grow on both left and right sides • Use left recursive rules for left-associative operators • Use right recursive rules for right-associative operators • Add new rules that establish “precedence cascade” between rules to specify precedence • Make sure operators with higher precedence appear lower in the cascade of rules • Revised grammar expr®expr+term | term term®term*factor | factor factor ®(expr )| NUMBER
Ambiguous Grammars (cont.) Problem 1: <expr> → <expr> + <expr> | <expr> - <expr> | <expr> * <expr> | <expr> / <expr> | ( <expr> ) | NUMBER NUMBER = [0-9]+ Show that this grammar is ambiguous by constructing two distinct parse trees for each of the following expressions: 30 + 5 + 2 30 – 5 – 2 30 * 5 * 2 30 / 5 / 2 30 + 5 * 2
Ambiguous Grammars (cont.) • Revised unambiguous grammar <expr> → <expr> + <term> | <expr> - <term> | <term> <term> → <term> * <factor> | <term> / <factor> | <factor> <factor> → ( <expr> ) | NUMBER NUMBER = [0-9]+
Ambiguous Grammars (cont.) Problem 2: Show that the following grammar is ambiguous: <S> → <A> <A> → <A> + <A> | <id> <id> → a | b | c
Ambiguous Grammars (cont.) • Are there other alternatives to resolving ambiguities? • Yes, but they change the language! • Fully-parenthesized expressions: expr®(expr+expr ) |(expr-expr )| NUMBER • Prefix expressions: expr®+exprexpr |-exprexpr | NUMBER
Extended BNF • Adds new metasymbols (or operations) to BNF to enhance readability and writability. • These new extensions do not enhance the descriptive power of BNF. • It facilitates development of parsing tools based on an approach called Recursive-Descent Parsing. • New metasymbols added to EBNF: • { } zero or more repetitions • [ ] optional parts • ( | ) multiple-choice
Extended BNF (cont.) • Examples: BNF: <number> → <number> <digit> | <digit> EBNF: <number> → <digit> {<digit>} BNF: <expr> → <expr> + <term> | <term> EBNF: <expr> → <term> {+ <term>} BNF: <expr> → <term> ^ <expr> | <term> EBNF: <expr> → <term> [^ <expr>] BNF: <selection> → if <logic-expr> then <statement> | if <logic-expr> then <statement> else <statement> EBNF <selection> →if <logic-expr> then <statement> [else <statement>] BNF: <for-stmt> → for <var> := <expr> to <expr> do <statement> | for <vat> := <expr> downto <expr> do <statement> EBNF: <for-stmt> → for <var> := <expr> (to | downto) <expr> do <stmt>
Extended BNF (cont.) • More examples: BNF: <expr> → <expr> + <term> | <term> <term> → <term> * <power> | <term> / <power> | <term> % <power> | <power> <power> → <factor> ^ <power> | factor <factor> → (<expr>) | NUMBER NUMBER = [0-9]+ EBNF: <expr> → <term> {+ <term>} <term> → <power> { * <power> | / <power> | % <power> } <power> → <factor> [^ <power>] <factor> → (<expr>) | NUMBER NUMBER = [0-9]+
Syntax Diagrams • A graphical representation for a grammar rule • An alternative to EBNF • Circle or ovals for terminals • Squares or rectangles for nonterminals • Terminals and nonterminals are connected with lines and arrows • Visually appealing but takes up space • Rarely seen any more: EBNF is much more compact