290 likes | 420 Views
CPS 506 Comparative Programming Languages. Syntax Specification. Compiling Process Steps. Program Lexical Analysis Convert characters into a stream of tokens Lexical Analysis Syntactic Analysis Send tokens to develop an abstract representation or parse tree. 2.
E N D
CPS 506Comparative Programming Languages Syntax Specification
Compiling Process Steps • Program Lexical Analysis • Convert characters into a stream of tokens • Lexical Analysis Syntactic Analysis • Send tokens to develop an abstract representation or parse tree 2
Compiling Process Steps (con’t) • Syntactic Analysis Semantic Analysis • Send parse tree to analyze for semantic consistency and convert for efficient run in the architecture (Optimization) • Semantic Analysis Machine Code • Convert abstract representation to executable machine code using code generation 3
Formal Methods and Language Processing • Meta-Language • A language to define other languages • BNF (Backus-Naur Form) • A set of rewriting rules ρ • A set of terminal symbols ∑ • A set of non-terminal symbols Ν • A start symbol S єΝ • ρ : Αω • ΑєΝandωє (Ν U Σ) • Right-hand side: a sequence of terminal and non-terminal symbols • Left-hand side: a non-terminal symbol 4
BNF (con’t) • The words in Ν: grammatical categories • Identifier, Expression, Loop, Program, … • S : principal grammatical category • Symbols in Σ: the basic alphabet • Example 1: binaryDigit 0 binaryDigit 1 • or binaryDigit 0 | 1 • Example 2: Integer Digit | Integer Digit Digit 0|1|2|3|4|5|6|7|8|9 5
BNF (con’t) Integer Digit Integer Integer Digit 1 8 Digit 2 • Parse Tree • Derivation Integer Integer Digit Integer Digit DigitDigit Digit Digit 2 Digit Digit28 Digit 281 6
BNF (con’t) • Lexeme: The lowest-level syntactic units • Tokens : A set of all grammatical categories that define strings of non-blank characters (Lexical Syntax) • Identifier (variable names, function names,…) • Literal (integer and decimal numbers,…) • Operator (+,-,*,/,…) • Separator (;,.,(,),{,},…) • Keyword (int, if, for, where,…) 7
BNF (con’t) Comment Keyword Separator Identifier Literal Operator // comments … void main ( ) { float p; p = 3.14 ; } 8
Regular Expressions • An alternative for BNF to define a language lexical rules • x : A character • “abc” : A literal string • A | B : A or B • A B : Concatenation of A and B • A* : Zero or more occurrence of A • A+ : One or more occurrence of A • A? : Zero or one occurrence of A • [a-z A-Z] : Any alphabetic character • [0-9] : Any digit • . : Any single character • Example Integer : [0-9]+ Identifier : [a-z A-Z][a-z A-Z 0-9]* 10
Syntactic Analysis • Primary tool: BNF • Input: Tokens from lexical analysis • Output: Parse • Syntactic categories • Program • Declaration • Assignment • Expression • Loop • Function definition 11
Syntactic Analysis (con’t) • Example Arithmetic Expression Term | Arithmetic Expression + Term | Arithmetic Expression – Term Term Factor | Term * Factor | Term / Factor Factor Identifier | Literal | ( Arithmetic Expression ) 12
Syntactic Analysis (con’t) Arithmetic Expression Arithmetic Expression Term - Term Factor * Term Identifier Factor Factor Literal Letter Literal Integer a Integer 2 3 • Example 2 * a - 3 13
Syntactic Analysis (con’t) • BNF limitations • Declaration of identifiers? • Initial value of identifiers? • In statically typed languages • Using Type System for the first problem • Detect in compile time or run time 14
Ambiguous Grammar • A string is parsed into two or more various trees • Example Exp Identifier | Literal | Exp – Exp Input: A – B – C Output: 1- A – (B – C) 2- (A – B) – C • Another example is “dangling else” • Using BNF rules • Using extra-grammatical rules 15
Operator Precedence <expr> <id> + <expr> | <id> * <expr> | ( <expr> ) | <id> A = B + C * A A = B + (C * A) A = B * C + A A = B * (C + A) Solution <expr> <expr> + <term> | <term> <term> <term> * <factor> | <factor> <factor> ( <expr> ) | <id> A = B + C * A A = B + (C * A) A = B * C + A A = (B * C) + A 16
Associativity of Operators A + B + C A * B * C A / B / C … • Left Associativity • Left Recursive: In a grammar rule, LHS also appears at the beginning of its RHS <expr> <expr> + <term> | <term> A + B + C (A + B) + C • Right Associativity • Right Recursive: In a grammar rule, LHS also appears at the end of its RHS <factor> <exp> ** <factor> | <exp> <exp> ( <expr> ) | <id> A + B ** C A + (B ** C) 17
Extended BNF (EBNF) Optional part of an RHS <if_stmt> if ( <expr> ) <statement> [ else <statement> ] Repetition, or recursion, part of an RHS <id_list> <id> { , <id_list> } Multiple choice option of an RHS <term> <term> ( * | / | % ) <factor> Optional use of * and + <id_list> <id> { , <id_list> }* <integer> {0 | … | 9}+ 18
Extended BNF (EBNF) (con’t) Factor Term * | / • opt subscript Conditional Statement if ( Expr ) Statement { else Statement }opt • Syntax Diagram 19
Case Study A BNF or EBNF for one grammar, such as Expression, different Literals, or if Statement in Java, C, C++, or Pascal BNF or EBNF for floating point numbers in Java, C, C++ BNF or EBNF for loop statements in one language 20
Abstract Syntax • Pascal While i < 10 do begin i := i+ 1; end; • C or Java while (i < 10) { i = i + 1; } Consider the following codes: Although syntax are different, they are essentially equivalent Abstract Syntax is a solution to show the essential elements of a language 21
Abstract Syntax (con’t) Member Element • General Form Abstract Syntax Class = list of essential components • Example Loop = Expression test; Statement body • A Java class for abstract syntax of loop class Loop extends Statement { Expression test; Statement body; } 22
Abstract Syntax (con’t) Member Element • More examples Assignment = Variable target; Expression source • A Java class for abstract syntax of Assignment class Assignment extends Statement { Variable target; Expression source; } 23
Abstract Syntax Tree Statement Assignment Variable Expression x Value 2 • A tree to show the abstract syntax tree Example x = 2; x := 2; Assignment = Variable target; Expression source 24
Recursive Descent Parser A top-down parser to verify the syntax of a stream of text from left to right It contains several recursive methods, each of which implements a rule of the grammar More details and parsing algorithms in Compiler course 25
Exercises Modify the following grammar to add a unary minus operator that has higher precedence than either + or *. <assign> <id> = <expr> <id> A | B | C <expr> <expr> + <term> | <term> <term> <term> * <factor> | <factor> <factor> ( <expr> ) | <id> 26
Exercises • Consider the following grammar: <S> <A> a <B> b <A> <A> b | b <B> a <B> | a Which of the following sentences are in the language generated by this grammar? • baab • bbbab • bbaaaaa • bbaab 27
Exercises • Convert the following EBNF to BNF: S A { bA } A a [b]A • Using grammar in question 1, add the ++ and – unary operators of Java. • Using grammar in question 1, show a parse tree and a leftmost derivation for each of the following statements: • A = (A+B) * C • A = B * (C * (A + B)) 28
Exercises Rrewrite the BNF in question 1 to give + precedence over *, and force + to be right associative. Using BNF write an algorithm for the language consisting of strings {ab}n, where n>0, such as ab, aabb, … . Can you write this using regular expressions? 29