280 likes | 1.02k Views
SYNTAX. Syntax: form, structure The syntax of a pl: The set of its well-formed programs The rules that define these programs Two views: Concrete syntax : program as text Abstract syntax : program as composite structure, a tree. Concrete syntax.
E N D
SYNTAX Syntax: form, structure The syntax of a pl: • The set of its well-formed programs • The rules that define these programs Two views: • Concrete syntax: program as text • Abstract syntax: program as composite structure, a tree Pls – syntax – Catriel Beeri
Concrete syntax The common view – program as text (a string) Common practice in compilers – divide into two levels • Lexical structure - the words מבנה מילוני • Lexical specification / analysisניתוח מילוני מפרט, • Phrase structure – the sentences מבנה תחבירי • Phrase structure specification / parsing מפרט, ניתוח תחבירי Pls – syntax – Catriel Beeri
Lexical A word: lexeme מילה Aclassof words: tokenאסימון for example: int, ident, real, leftpar, if ….. Lexical analysis – convert text to (token, lexeme) - stream 2.3 (real, 2.3) (4+5) leftpar (int, 4) plus (int, 5) rightpar Pls – syntax – Catriel Beeri
Lexical analysis: implementaion Token specified by regular expression Regular expression (ndet)finite automaton, • (det) finite automaton a program – a lexical analyzer Issues: • Many tokens • Where to stop • ….. Pls – syntax – Catriel Beeri
Phrase structure/analysis Specified by context free grammar (CFG) (BNF --- Backus-Naur form) • T – terminals (here, tokens – sets of lexemes) • N – non-terminals = names of syntactical categories • P –production rules Rule: A w (w is a string on N T) • S – start non-terminal • A CFG as a generative device: • Start from S • Replace non-terminals by strings, using rules Pls – syntax – Catriel Beeri
Example: CFG for simple arithmetic expressions T = {int, op}, N = {E}, S = E Rules: E ::= int | E op E (2 rules, | means `or’) Generation by a derivation: int op E op E => E op E => int op E => E=> int op int op E => int op int op int Could represent the expression 2 - 3 - 4 Pls – syntax – Catriel Beeri
Cont’d Here are two derivations: int op E => E op E => E=> E op E op E => And from both: int op E op E => int op int op E => int op int op int Are they really different? Pls – syntax – Catriel Beeri
E op E int int E int E op E op EopE => * E=> E op E => int op E op E => A derivation corresponds to a derivation tree: E op E => int op E => E=> int op into op int int op int op E => E Pls – syntax – Catriel Beeri
int Derivations vs. derivation trees A derivation tree represents many derivations If there is a word with several derivation trees, the CFG is ambiguous. Example: E E op E E E E op int E E op E E op int int int int Pls – syntax – Catriel Beeri
The problem is addressed by: • Adopting left associativity • Allowing parentheses in expressions • Changing the CFG: • New non-terminal T (for term) • New rules: E ::= E op T | T T ::= int | (E) Pls – syntax – Catriel Beeri
This CFG is unambiguous, and reflects left associativity E => E op T int op int op int Pls – syntax – Catriel Beeri
E A derivation tree More complex than expression tree E op T int ( E ) E op T T int int Pls – syntax – Catriel Beeri
Phrase structure -summary • A language is specifiable by many CFG’s • A CFG needs to address: • Ambiguity (avoid) • Associativity (express) • Precedence (express) • Efficient parsing (ensure) Methodologies for transforming CFG’s to account for the above are known The resulting CFG’s are complex; so are the derivation trees. Pls – syntax – Catriel Beeri
Abstract Syntax Consider: • (if (< x 3) 4 7) (scheme) X < 3? 4 : 7 (C) • (let ((x 5) (+ x 3) (scheme) let x = 5 in x + 3 (OCAML) Each pair is the “same” expression, same components The meaning is explained in sameway: E.g., for the conditional: Evaluate the test if its value is true evaluate the 1st branch, else evaluate the 2nd Pls – syntax – Catriel Beeri
In abstract syntax: a program/expression is viewed as a labeled tree/ a compound structure • A labeled leaf, represents an atomic phrase. label represents the category • A larger tree represents a compound phrase • The root label is its category • The children are its components int (3) Pls – syntax – Catriel Beeri
IfExpr branch2 test branch1 E3 E2 E1 IfExpr : {test = E1, branch1 = E2 branch3 = E3} Typical building blocks: • Record: Type can be expressed as anOCAML datatype type ifexpr = IfExpr of {test : expr; branch1 : expr; branch2 : expr} Pls – syntax – Catriel Beeri
IfExpr E3 E2 E1 IfExpr : (E1, E2,E3) • Tuple: type ifexpr = IfExpr of expr * expr * expr Tuple vs. Record: field name vs. ordering Pls – syntax – Catriel Beeri
CmpdStmt : (S1, S2, … , Sn) • Sequence: type cmpd_stmt = CmpdStmt of stmt list Tuple vs. sequence: In a tuple type, number of fields is known & fixed Pls – syntax – Catriel Beeri
Summary of abstract syntax Abstract syntaxis the structure of the program keywords, separators, conventions - not included associativity, precedence, unambiguity- non-issues Parsing: convert from concrete to abstract syntax Type-checking, semantics, compiler translation use abstract syntax In rest of course: abstract syntax Pls – syntax – Catriel Beeri
Q: Can a cfg derivation tree serve as abstract syntax tree? Pls – syntax – Catriel Beeri
Syntax (concrete/abstract) is an inductive definition Example : E ::= int | id | E op E As rules: How will the rules look like for type expr = Int of int | Id of string | Expr of expr * exp ? Pls – syntax – Catriel Beeri
Common informal approach to abstract syntax specification Use a string CFG, interpret as a tree grammar • Ignore keywords • Labels and structures - left to reader to decide This shows the category, the components Sufficient for semantics Example: If-Expr ::= if Expr then Expr else Expr This is the approach in the course Pls – syntax – Catriel Beeri
A convention for abstract syntax Use variables, declare them before rules, omit indices Example : A similar convention often used for inductive definitions Pls – syntax – Catriel Beeri