310 likes | 520 Views
Programming Languages Third Edition. Chapter 6 Syntax. Objectives. Understand the lexical structure of programming languages Understand context-free grammars and BNFs Become familiar with parse trees Understand ambiguity, associativity, and precedence Read Sections 6.1 – 6.4, pp. 204-220.
E N D
Programming LanguagesThird Edition Chapter 6 Syntax
Objectives • Understand the lexical structure of programming languages • Understand context-free grammars and BNFs • Become familiar with parse trees • Understand ambiguity, associativity, and precedence • Read Sections 6.1 – 6.4, pp. 204-220 Programming Languages, Third Edition
Introduction • Syntax is the structure of a language • Syntax rules are analogous to the grammar rules of a natural language • John Backus and Peter Naur developed a notational system for describing these grammars, now called Backus-Naur forms, or BNFs • First used to describe the syntax of Algol60 • Every modern computer scientist needs to know how to read, interpret, and apply BNF descriptions of language syntax Programming Languages, Third Edition
Flowchart for Compilation Source Code (your program) Compiler Object Code (machine language) Programming Languages, Third Edition
Flowchart for Compilation - Details Source Code (your program = char stream) Semantic analysis (analyses meaning) Scanner (lexical analysis) Intermediate Code Lexical items / Tokens Optimization Parser (syntactic analysis) Object Code (machine language) Parse tree Programming Languages, Third Edition
Lexical Structure of Programming Languages • Lexical structure: the structure of the tokens, or words, of a language • Related to, but different than, the syntactic structure • Scanning phase: the phase in which a translator collects sequences of characters from the input program and forms them into tokens • Parsing phase: the phase in which the translator processes the tokens, determining the program’s syntactic structure Programming Languages, Third Edition
Lexical Structure of Programming Languages (cont’d.) • Tokens generally fall into several categories: • Reserved words (or keywords) • Literals or constants • Special symbols, such as “;” “<=“ “+” • Identifiers Programming Languages, Third Edition
Lexical Structure of Programming Languages (cont’d.) • Token delimiters (or white space): formatting that affects the way tokens are recognized • Indentation can be used to determine structure • Free-format language: one in which format has no effect on program structure other than satisfying the principle of longest substring • Fixed format language: one in which all tokens must occur in prespecified locations on the page • Tokens can be formally described by regular expressions Programming Languages, Third Edition
ScanningRegular Expressions • Metalanguage for describing patterns for strings of characters – metasymbols are | means choice * means zero or more occurrences + means one of more occurrences ? means one optional occurrence [ ] choose one of list of chars in brackets can use a range . (period) means one of any character ( ) can be used for grouping \ can precede metasymbol with this to use metasymbol in string Programming Languages, Third Edition
Regular Expressions (cont’d.) • Most modern text editors use regular expressions in text searches • Utilities such as lex can automatically turn a regular expression description of a language’s tokens into a scanner Programming Languages, Third Edition
Regular Expressions (cont’d.) • Examples: (a|b)*c [ab]*c [aeiou] [aeiouAEIOU] [aeiouAEIOU]+ [aeiouAEIOU]* [A-Z][a-z]* [A-Z]+[a-z] [A-Za-z]* Programming Languages, Third Edition
Regular Expressions (cont’d.) • Examples: [0-9]+ [0-9]+(\.[0-9]+) • Can test by making text file on Unix and using egrep –x “pattern” filename Programming Languages, Third Edition
Regular Expressions (cont’d.) • Let’s try writing some: • Signed integers, sign not optional • Signed integers, sign optional • Signed integers, sign optional, no signed zero • Signed integers, allow leading zeros, but no signed zero Programming Languages, Third Edition
Regular Expressions (cont’d.) • Let’s try writing some for license plates: • Start with VA, followed by zero or more digits • Start with VA, followed by one or more digits • Start with VA, followed by 2 digits, followed by zero or more lower case letters • Start with V or A, followed by -, followed by 2-4 digits • Start with VA, any case, followed by 2-3 digits or 2-3 letters (tried in class) • Start with VA, any case, followed by 2-3 digits, followed by 2-3 letters (meant to try) Programming Languages, Third Edition
ParsingContext-Free Grammars and BNFs • Context-free grammar: consists of • a series of grammar rules (Productions) • Each rule has a single phrase structure name on the left, then a metasymbol, followed by a sequence of symbols or other phrase structure names on the right • Nonterminals: names for phrase structures, since they are broken down into further phrase structures • Start symbol: one of the Nonterminals • Terminals: words or token symbols that cannot be broken down further Programming Languages, Third Edition
Example 1: Unsigned Integers <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Terminals: 0, 1, … , 9 Nonterminals: <num> , <digit> Start Symbol: <num> Productions: there are 12 Metasymbols: “::=“ , “|” Programming Languages, Third Edition
Example 1 (cont’d) • Derivation: the process of building in a language by beginning with the start symbol and replacing left-hand sides by choices of right-hand sides in the rules • Let’s derive the number 123 (on board) • Parse tree: graphical depiction of the replacement process in a derivation • Let’s draw parse tree for 123 (on board) Programming Languages, Third Edition
Example 1 (cont’d) • Notice recursion in one of rules • Notice recursive symbol is on left • This is a left-recursive grammar • This is a left-associative grammar • Notice how parse tree cascades to left Programming Languages, Third Edition
Example 2: Unsigned Integers <num> ::= <digit> | <digit> <num> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Only made one change, so now grammar is • Right-recursive • Right-associative • Let’s draw parse tree for 123 Programming Languages, Third Edition
Ex 3: Simple Expression Grammar <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Let’s derive parse tree for: 3 + 4 + 5 Programming Languages, Third Edition
Ex 3 (cont’d) <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Is there another parse tree for: 3 + 4 + 5 Programming Languages, Third Edition
Ex 3 (cont’d) <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 A grammar is ambiguous if there are two parse trees for the same string Programming Languages, Third Edition
Ex 3 (cont’d) <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Ambiguity is undesirable (but sometimes unavoidable) Let’s see why it’s undesirable: Derive parse trees for 3 + 4 * 5 Programming Languages, Third Edition
Ex 3 (cont’d) <expr> ::= <expr> + <expr> | <expr> * <expr> | ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 So what was the problem? Which tree provides correct arithmetic interpretation? Programming Languages, Third Edition
Ex 3 (cont’d) Can we modify the grammar to “fix” the problem? YES! Add more levels of productions: <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Programming Languages, Third Edition
Ex 3 (cont’d) <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= ( <expr> ) | <num> <num> ::= <digit> | <num> <digit> <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Redraw parse trees for 3 + 4 + 5 and 3 + 4 * 5 Programming Languages, Third Edition
Chapter 6Final Thoughts • A grammar is context-free when nonterminals appear singly on the left sides of productions • There is no context under which only certain replacements can occur • Anything not expressible using context-free grammars is a semantic, not a syntactic, issue • BNF form of language syntax makes it easier to write translators • Parsing stage can be automated Programming Languages, Third Edition
Chapter 6Final Thoughts • Syntax establishes structure, not meaning • But meaning is related to syntax • Syntax-directed semantics: process of associating the semantics of a construct to its syntactic structure • Must construct the syntax so that it reflects the semantics to be attached later Programming Languages, Third Edition