200 likes | 354 Views
Translating High Level Languages. Stages of translation. Lexical analysis Syntactic analysis Code generation Linking Before Execution. Lexical analysis. Translate stream of characters into lexemes Lexemes belong to categories called tokens
E N D
Translating High Level Languages D Goforth COSC 3127
Stages of translation • Lexical analysis • Syntactic analysis • Code generation • Linking Before • Execution D Goforth COSC 3127
Lexical analysis • Translate stream of characters into lexemes • Lexemes belong to categories called tokens • Token identity of lexemes is used at the next stage of syntactic analysis D Goforth COSC 3127
Examples: tokens and lexemes • Some token categories contain only one lexeme: semi-colon; • Some tokens categorize many lexemes: identifier count, maxCost,… D Goforth COSC 3127
Tokens and Lexemes yVal=x +450 – min(100, 4xVal )); identifier illegal lexeme equal_sign left_paren • Lexical analysis • identifies lexemes and their token type • recognizes illegal lexemes (4xVal) • does NOT identify syntax error: ) ) D Goforth COSC 3127
Syntax or Grammar of Language rules for • generating (used by programmer) or • recognizing (used by syntactic analyzer in translation a valid sequence of lexemes D Goforth COSC 3127
Grammars • 4 categories of grammars (Chomsky) • Two categories are important in computing: • Regular expressions (pattern matching) • Context-free grammars (programming languages) D Goforth COSC 3127
Context-free grammar • Meta-language for describing languages • States rules or productions for what lexeme sequences are correct in the language • Written in Backus-Naur Form (BNF) D Goforth COSC 3127
Example of BNF rule • PROBLEM: how to recognize all these as correct? • y = x • f = rVec.length + 1 • button[4].label = “Exit” • RULE for defining assignment statement: • <assign> <variable> = <expression> • Assumes other rules for <variable>, <expression> D Goforth COSC 3127
BNF rules • Non-terminal and terminal symbols: • Non-terminals are defined by at least one rule • Terminals are tokens or lexemes <assignment> < var> = <expression> D Goforth COSC 3127
Simple sample grammar(p.113) <assign> <id> = <expr> <id> A | B | C // lexical <expr> <id> + <expr> | <id> * <expr> |( <expr>) | <id> Assumes other rules for <variable>, <expression> D Goforth COSC 3127
Simple sample production <assign><id> = <expr> <- apply one rule at each step B = <expr> to leftmost non-terminal B = <id> * <expr> B = A * <expr> B = A * ( <expr> ) B = A * ( <id> + <expr> ) B = A * ( C + <expr> ) B = A * ( C + <id> ) B = A * ( C + C ) D Goforth COSC 3127
Sample parse tree <assign> <expr> <id> = <expr> <id> * B <expr> ) A ( <id> <expr> + <id> C C Leaves represent the sentence of lexemes D Goforth COSC 3127
Ambiguous grammar • Different parse trees for same sentence • Different translations for same sentence • Different machine code for same source code! D Goforth COSC 3127
Grammars for ‘human’ conventions • Putting features of languages into grammars: • expression any length • precedence - an extra non-terminal • associativity - order in recursive rules • nested if statements - “dangling else” problem: p. 119 D Goforth COSC 3127
Forms for grammars • Backus-Naur form (BNF) • Extended Backus-Naur fomr (EBNF) -shortens set of rules • Syntax graphs -easier to read for learning language D Goforth COSC 3127
EBNF • optional zero or one occurrence <expr> -> [ <expr> + ] <term> • optional zero or more occurrences <expr> -> <term>{ + <term> } • ‘or’ choice of alternative symbols <term> -> <term>[ (*|/) <term> ] D Goforth COSC 3127
Syntax Graph - basic structures expr term * term factor term / term factor * factor / expr term + term -
BNF (p. 121) EBNF <expr> -> <expr>+<term> | <expr>-<term> | <term> <term> -> <term>*<factor> | <term>/<factor> | <factor> <expr> -> [<expr> (+|-)] <term> <term> -> [<term> (*\/)] <factor> <expr> -> <term> {(+|-) <term>} <term> -> <factor> {(*|/)<factor>} Syntax Graph expr term + term - term factor * factor /
Attribute grammars • Problem: context-free grammars cannot describe some features needed in programming • e.g.: rules for using data types D Goforth COSC 3127