460 likes | 720 Views
Chapter 3 Describing a Programming Language. Fall 2014. Introduction. Language and sentence Let be a set of characters. A language over is a set of strings of characters drawn from . Alphabet = English characters Language = English sentences Alphabet = ASCII Language = C programs
E N D
Chapter 3Describing a Programming Language Fall 2014 CS 1621
Introduction • Language and sentence • Let be a set of characters. A language over is a set of strings of characters drawn from . • Alphabet = English charactersLanguage = English sentences • Alphabet = ASCIILanguage = C programs • Each string over is a sentence • A language is a set of sentences • For Java programming language, a Java program is a sentence • Java programming language consists of all legal Java programs • Infinite ? PITT CS 1621
The Compilation Process Source Program IF (a<b) THEN c=1*d; Lexical Analyzer IF ( ID “a” < ID “b” THEN ID “c” = CONST “1” * ID “d” Token Sequence a Syntax Analyzer cond_expr < b Syntax Tree IF_stmt lhs c list 1 assign_stmt rhs Semantic Analyzer * d GE a, b, L1 MUlT 1, d, c L1: 3-Address Code Code Optimizer GE a, b, L1 MOV d, c L1: loadi R1,a cmpi R1,b jge L1 loadi R1,d storei R1,c L1: Optimized 3-Addr. Code Code Generation Assembly Code Pitt CS 1621
Describing a Programming Language • Describe programming languages • Not easy to provide a concise and precise description • Three phases • Describing tokens • A category of lowest level syntactic unit over alphabet e.g. • Describing syntax • The structure of program structures, expression, statements, etc. • Describing semantics • The meaning of the program, expression, statement, etc. e.g. only then-branch is executed if cond_expr is evaluated to be true cond_expr IF ID “a” THEN e IF_stmt stmt PITT CS 1621
Regular Expression • Definition: • The regular expressions over are the smallest set of expressions including • • `c` where c • A+B where A, B are RE over • AB where A, B are RE over • A* where A is a RE over • Each RE corresponds to a regular language • We use them interchangeably PITT CS 1621
Examples • Keywords: “else” or “if” or “while” or … • `else` + `if` + `while` + … • `else` abbreviates `e` `l` `s` `e` • keywords = { `else`, `if`, `then`, `while`, … } • Integer: • digit = `0`+`1`+`2`+`3`+`4`+`5`+`6`+`7`+`8`+`9` • integer = digit digit* • is `000` an integer? PITT CS 1621
Examples Around • Phones number: consider (412) 624-0000 • ∑ = digit { -, (, ) } • area = digit3 • exchange = digit3 • phone = digit4 • phone_number = '(' area ')' exchange '-' phone • Email address: student@cs.pitt.edu • ∑ = letter { . , @ } • name = letter+ • address = name`@`name`.`name` Pitt CS 2210
More Examples and Practice • RE used in languages • By itself is a string, semantically interpreted as a RE • RE in perl: if ($str =~ /(\d+)/ … • RE in C#: Match m = Regex.Match("abracadabra", "(a|b|r)+"); • Defined similarly • Regular expression used in C# and many other PLs PITT CS 1621
Practice: • “^abc\d+\w{3}5?\S$” Meaning: • Give a RE to describe email addresses legal ones: abc.abc@ab.c.ab.d.edu Try to write a RE that have matching # of { and } {}, {{}}, {{{}}}, … PITT CS 1621
Practice: • “^abc\d+\w{3}5?\S$” Meaning: • Give a RE to describe email addresses legal ones: abc.abc@ab.c.ab.d.edu Try to write a RE that have matching # of { and } {}, {{}}, {{{}}}, … Mission impossible !!! PITT CS 1621
Describing Language Syntax • Context Free Language to describe language syntax • RE is not powerful enough • Using BNF (Backus-Naur Form, 1959) • Invented to describe ALGOL 58 • BNF fundamentals • Two types of symbols • Non-terminal symbols: BNF abstraction • Terminal symbols: tokens • A set of grammar rules • LHS → RHS PITT CS 1621
BNF Rules • A grammar is a finite nonempty set of rules • A rule has one left hard symbol (LHS) and can have more than one right hand symbols <program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | const PITT CS 1621
From Rules to Sentences • Derivation • A sentence can be generated by a repeat application of rules, starting from the start symbol • Start symbol is always a non-terminal symbol <program> => <stmts> => <stmt> => <var> = <expr> => a =<expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const PITT CS 1621
Derivation • Every string of symbols in the derivation is a sentential form • A sentence is a sentential form that has only terminal symbols • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded • A derivation may be neither leftmost nor rightmost PITT CS 1621
Parse Tree • A hierarchical representation of a derivation <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b PITT CS 1621
Examples • Describing • {}, {{}}, {{{}}}, …. • A programming language with infinite sentences b, ab, aab, aaab, aaaab, …, … • Use recursive definition PITT CS 1621
Ambiguous Grammar <expr> <expr> <op> <expr> | const <op> / | - <expr> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const PITT CS 1621
Ambiguity • A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees • We can rewrite the grammar to remove the ambiguity <expr> <expr> - <term> | <term> <term> <term> / const| const • Ambiguity is related to the grammar not the language <expr> <expr> - <term> <term> <term> / const const const PITT CS 1621
Removing Ambiguity • Introducing associativity and precedence • ambiguous: <expr> -> <expr> + <expr> | const • unambiguous: <expr> -> <expr> + <term> | <term> <term> <term> / const| const PITT CS 1621
Extended BNF • Optional parts are placed in brackets [ ] <proc_call> -> ident [(<expr_list>)] • Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term>(+|-) const • Repetitions (0 or more) are placed inside braces { } <ident> → letter {letter|digit} PITT CS 1621
BNF and EBNF • BNF <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor> • EBNF <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>} PITT CS 1621
From Syntax to Semantic • Context free grammar is powerful • Why ? e.g. variable definition before use <program> <def> <use> <def> int x; | int y; <use> x = 1; | y = 1; or <program> <def> <use> <def> int x; | int y; int x; <use> int x; x = 1; int y; <use> int y; y = 1; Context free grammar is not powerful enough !!! C programming language is not context free !!! PITT CS 1621
Compilation int $g=0; void main ())) { printf(“hello, $$$!\n”); } any input g=0; void main ())) { printf(“hello, world!\n”); }} pass lexical analysis pass syntax analysis g = 0; void main () { printf(“hello, world!\n”); } C Programming Language pass semantic analysis int g=0; void main () { printf(“hello, world!\n”); } Legitimate C program PITT CS 1621
An Incorrect Language X is Bigger Language X1 = { all strings } = { ` int 变量=0; void main() {f=0; printf “hello, world,\n”;} `, ` int g =0; void main()))) {f=0; printf “hello, world,\n”;} `, `int g =0; void main() { f=0; printf “hello, world,\n”;} `, `int f =0; void main() { f=0; printf “hello, world,\n”;} `, …. } Language X2 = { token+ } = { ` int g =0; void main()))) { f=0; printf “hello, world,\n”;} `, `int g =0; void main() { f=0; printf “hello, world,\n”;} `, `int f =0; void main() { f=0; printf “hello, world,\n”;} `, …. } PITT CS 1621
An Incorrect Language X is Bigger Language X3 = { generated from C-BNF } = { ` int g =0; void main() { f=0; printf “hello, world,\n”;} `, `int f =0; void main() { f=0; printf “hello, world,\n”;} `, …. } Language C = { C programs } = { `int f =0; void main() { f=0; printf “hello, world,\n”;} `, …. } PITT CS 1621
Describing Language Semantics • Describing static semantics • Attribute grammar • Describing dynamic semantics • Operational semantics • Axiomatic semantics • Denotational semantics PITT CS 1621
Attribute Grammars • Additions to CFGs to carry some semantic info along parse trees • CFG: context free grammar • The one we used in describing the syntax structure • A set of rules, two type of symbols • Values: • Static semantics specification • Compiler design (static semantics checking) PITT CS 1621
Example E T + E1 {E.val = T.val + E1.val} E T {E.val = T.val} T int * T1 {T.val = int.val * T1.val} T int {T.val = int.val} Context Free Grammar Semantic actions attributes PITT CS 1621
Example: T – synthesized attribute “type” L – has inherited attribute “in” DTL { L.in = T.type } T int { T.type = integer } T real { T.type = real } L L1, id { L1.in = L.in, addtype(id.entry, L,in) } L id { addtype(id.entry, L.in)} dashed edges show dependencies D L1 L1.in = T.type T.type = int T L2.in = L1.in int L2 , id addtype(id.entry, L1.in) addtype(id.entry, L2.in) L3 , id L.in = L.in addtype(id.entry, L3.in) id Pitt CS 1621
Definition • Attribute grammar • Attributes: • For each grammar symbol x there is a set A(x) of attribute values • Semantic actions: • Each rule has a set of functions that define certain attributes of the nonterminals in the rule • Two types of attributes • Synthesized attributes • Inherited attributes • Computation on parse tree --- how ? PITT CS 1621
P P c1 S1 c2 S2 c3 S3 c4 S4 Two Types of Attributes Synthesized attributes: values are computed from ones of the children nodes Synthesized of P = f(c1, c2, c3, c4) Inherited attributes: values are computed from attributes of the siblings and parent of the node Inherited of S4= f(P, S1, S2, S3) • semantic rules create dependencies between attributes represent the dependencies as a graph • from the graph, derive evaluation order for semantic rules Pitt CS 2210
A A D E F D E F Example • Terminal symbols – have synthesized attributes only • Start symbol – is assumed not to have any inherited attributes • Synthesized/Inherited attributes are naturally computed bottom-up/top-down, respectivly • In practice, we usually use only one evaluation order • Converting attributes accordingly b is synthesized attribute of A ADEF ci’s attributes of D, E, F b is synthesized attribute of D ADEF ci’s attributes of A, E, F Pitt CS 2210
Dynamic Semantics • I: Operational semantics • Describe the meaning of a program by executing its statements on a machine, either simulated or actual. The change in the state of the machine (memory, registers, etc.) defines the meaning of the statement • Example C statement: for(expr1; expr2; expr3) S Operational semantics: expr1; Labe_1: if expr2 ==0 got Label_2 S expr3; goto Label_1 Label_2: PITT CS 1621
Discussion • Describing operational semantics requires a real or virtual machine • Interpreting the operation using instructions from a computer • Machine-dependent • May not be easy to understand • Interpreting using instructions from an idealized computer • Definition of lower level computer required • Translator required • Evaluation • Good if used informally • Extremely complex if used formally (e.g., VDL), it was used for describing semantics of PL/I. PITT CS 1621
II: Axiomatic Semantics • Originally • Based on formal logic (predicate calculus) • For formal program verification • Form • Assertions • Pre-, post- conditions: {P} statement {Q} {b > 0} a = b + 1 {a > 1} • Axioms • Inference rules how to interpreting … PITT CS 1621
Pre- and Post- Condition • Let us look at a simple axiom for assignment “X=E” • From post-condition to find the weakest pre-condition P = Q X→E replacing the appearance of X in Q using E example: {?} a = b/2-1 {a<10} • A more complicated example • {?} • a=3; • if (a>2) a=b+1; else • a=a+1; • {a>10} PITT CS 1621
More axioms • An axiom for assignment statements (x = E): {Qx->E} x = E {Q} • The Rule of Consequence: • An inference rule for sequences {P1} S1 {P2} {P2} S2 {P3} • Loop? • Function call? • Input? PITT CS 1621
Evaluation of Axiomatic Semantics • Developing axioms or inference rules for all of the statements in a language is difficult • Good for correctness proofs, for reasoning about programs • Limited power for language designers or compiler writers PITT CS 1621
III: Denotational Semantics • Based on recursive function theory • The most abstract semantics description method • Originally developed by Scott and Strachey (1970) PITT CS 1621
Building Denotational Specification • The state of a program is the values of all its current variables s = {<i1, v1>, <i2, v2>, …, <in, vn>} • Let VARMAP be a function that, when given a variable name and a state, returns the current value of the variable VARMAP(ij, s) = vj PITT CS 1621
Building Denotational Specification • The process of building a denotational specification for a language • Define a mathematical object for each language entity • Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects • The meaning of language constructs are defined by only the values of the program's variables • In denotational semantics, the state changes are defined by rigorous mathematical functions • In operational semantics, the state changes are defined by coded algorithms PITT CS 1621
Example Me(<expr>, s) = case <expr> of <dec_num> => Mdec(<dec_num>, s) <var> => if VARMAP(<var>, s) == undef then error else VARMAP(<var>, s) <binary_expr> => if (Me(<binary_expr>.<left_expr>, s) == undef OR Me(<binary_expr>.<right_expr>, s)== undef) then error else if (<binary_expr>.<operator> == ‘+’ then Me(<binary_expr>.<left_expr>, s) + Me(<binary_expr>.<right_expr>, s) else Me(<binary_expr>.<left_expr>, s) * Me(<binary_expr>.<right_expr>, s) ... PITT CS 1621
Evaluation • Can be used to prove the correctness of programs • Provides a rigorous way to think about programs • Can be an aid to language design • Has been used in compiler generation systems • Because of its complexity, they are of little use to language users PITT CS 1621
Summary • Describing programming languages • Describing tokens • Using RE • Describing syntax • Using BNF/CFG • Describing semantics • Static semantics • Using attributed grammar • Dynamic semantics • No widely accepted approaches • Three possible ways: operational, axiomatic, denotational PITT CS 1621