370 likes | 557 Views
Chapter 2 Syntax. Syntax. The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be constituted from characters The syntactic structure specifies how sentences can be constituted from words. Lexical Structure.
E N D
Syntax • The syntax of a programming language specifies the structure of the language • The lexical structure specifies how words can be constituted from characters • The syntactic structure specifies how sentences can be constituted from words
Lexical Structure • The tokens of a programming language consist of the set of all baisc grammatical categories that are the building blocks of syntax • A program is viewed as a stream of tokens
Standard Token Categories • Keywords, such as if and while • Literalsorconstants, such as 42 (a numeric literal) or "hello" (a string literal) • Special symbols, such as “;”, “<=”, or “+” • Identifiers, such as x24, putchar, or monthly_balance
White Spaces and Comments • White spaces and comments are ignored except they function as delimiters • Typical white spaces: newlines, tabs, spaces • Comments: • /* … */, // … \n (C, C++, Java) • -- … \n (Ada, Haskell) • (* … *) (Pascal, ML) • ; … \n (Scheme)
C tokens There are six classes of tokens: identifiers, keywords, constants, string literals, operators, and other separators. Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments as described below (collectively, "white space") are ignored except as they separate tokens. Some white space is required to separate otherwise adjacent identifiers, keywords, and constants. If the input stream has been separated into tokens up to a given character, the next token is the longest string of characters that could constitute a token.
An Example /* This program counts from 1 to 10. */ main( ) { inti; for (i = 1; i <= 10; i++) { printf(“%d\n”, i); } }
Backus-Naur Form (BNF) • BNF is a notation widely used in formal definition of syntactic structure • A BNF is a set of rewriting rules , a set of terminal symbols , a set of nonterminal symbols N, and a “start symbol” SN • Each rule in has the following formA where AN and (N )*
Backus-Naur Form • The terminals in form the basic alphabet(tokens) from which programs are constructed • The nonterminals in N identify grammatical categories like Identifier, Integer, Expression, Statement, Function, Program • The start symbol S identifies the principal grammatical category being defined by the grammar
Examples 1. binaryDigit 0 binaryDigit 1 binaryDigit 0 | 1 2. IntegerDigit|Integer Digit Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 metasymbol or metasymbol concatenate
Derivation • Integer • IntegerDigit • IntegerDigitDigit • DigitDigitDigit • 3DigitDigit • 3 5Digit • 3 5 2 Sentential form Sentence
Parse Tree Sentential form
Example: Expression AssignmentIdentifier=Expression ExpressionTerm|Expression+Term |Expression–Term TermFactor|Term*Factor |Term/Factor Factor Identifier|Literal|(Expression)
Example: Expression x + 2 * y
Syntax for a Subset of C Program voidmain( ) {Declarations Statements } Declarations | Declarations Declaration Declaration Type Identifiers; Type int|boolean Identifiers Identifier|Identifiers, Identifier Statements | Statements Statement Statement ;|Block|Assignment |IfStatement|WhileStatement Block {Statements } AssignmentIdentifier=Expression; IfStatement if(Expression) Statement | if(Expression) Statement else Statement WhileStatement while(Expression) Statement
Syntax for a Subset of C ExpressionConjuction|Expression||Conjuction ConjuctionRelation|Conjuction&&Relation RelationAddition|Relation<Addition| Relation<=Addition| Relation>Addition| Relation>=Addition| Relation==Addition| Relation!=Addition AdditionTerm|Addition+Term|Addition–Term TermNegation|Term*Negation|Term/Negation Negation Factor|!Factor Factor Identifier|Literal|(Expression)
Example: Program . . void main ( ) { int x; x = 1;}
Ambiguity • A grammar is ambiguous if it permits a string to be parsed into two or more different parse trees AmbExpInteger|AmbExp – AmbExp2 - 3 - 4
An Example (2 – 3) – 4 2 – (3 – 4)
The Dangling Else Problem if ( x < 0 ) if ( y < 0 ) y = y – 1; else y = 0;
The Dangling Else Problem if ( x < 0 ) if ( y < 0 ) y = y – 1; else y = 0;
The Dangling Else Problem • Solution I: use a special keyword fi to explicitly close every if statement. For example, in AdaIfStatement if(E) S fi| if(E) S else S fi • Solution II: use an explicit rule outside the BNF syntax. For example, in C, every else clause is associated with the closest preceding if in the statement
Extended BNF (EBNF) • EBNF introduces 3 parentheses: • It uses { } to denote repetition to simplify the specification of recursion • It uses [ ] to denote the optional part • It uses ( ) for grouping
An Example ExpressionTerm| Expression+Term| Expression– Term TermFactor|Term*Factor|Term/ FactorFactor+number|-number|number grouping ExpressionTerm{ (+|– )Term} TermFactor{ (*|/ )Factor}Factor[+|-]number zero or more occurrences optional
Abstract Syntax • The abstract syntax of a language identifies the essential syntactic elements in a program without describing how they are concretely constructed while i < n do begin i := i + 1 end while(i < n){ i = i + 1; } Pascal C
Example: Loop • Thinking a loop abstractly, the essential elements are a test expression for continuing a loop and a body which is the statement to be repeated • All other elements constitute nonessential “syntactic sugar” • The complete syntax is usually called concrete syntax
Example: Loop while i < n do begin i := i + 1 end loop = < Pascal + i n i while (i < n) { i = i + 1; } i 1 C
Example: Expression x + 2 * y
+ x * y 2 Example: Expression x + 2 * y
Parser • A parser of a language accepts or rejects strings based on whether they are legal strings in the language • In a recursive-descent parser, each nonterminal is implemented as a function, and each terminal is implemented as a matching with the current token
Example: Calculator commandexpr ‘\n’ exprterm{‘+ ’term} termfactor{‘*’factor} factor number |‘(’expr‘)’ number digit{digit } digit 0| 1| 2| 3| 4| 5| 6| 7| 8| 9
Example: Calculator #include <ctype.h> #include <stdlib.h>#include <stdio.h>int token;int pos = 0; void command(void);void expr(void);void term(void);void factor(void);void number(void);void digit(void);
Example: Calculator main(){ parse(); return 0;} void getToken(void){ token = getchar(); pos++; while (token == ' ') { token = getchar(); pos++; }} void parse(void){ getToken(); command();}
Example: Calculator commandexpr ‘\n’ void command(void){ expr(); match(‘\n’);} void match(char c){ if (token == c) getToken(); else error();}
Example: Calculator exprterm{‘+ ’ term} termfactor{‘*’ factor} void term(void){ factor(); while (token == '*') { match('*'); term(); }} void expr(void){ term(); while (token == '+') { match('+'); term(); }}
Example: Calculator factornumber|‘(’ expr ‘)’ numberdigit{digit} void factor(void){ if (token == '(') { match('('); expr(); match(')'); } else { number(); }} void number(void){ digit(); while (isdigit(token)) digit();}
Example: Calculator void digit(void){ if (isdigit(token)) match(token); else error();} void error(void){ printf("parse error: position %d: character %c\n", pos, token); exit(1);}