1 / 44

Chapter 3 Describing a Programming Language

Chapter 3 Describing a Programming Language. Fall 2014. Introduction. Language and sentence Let  be a set of characters. A language over  is a set of strings of characters drawn from . Alphabet = English characters Language = English sentences Alphabet = ASCII Language = C programs

odette-bird
Download Presentation

Chapter 3 Describing a Programming Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3Describing a Programming Language Fall 2014 CS 1621

  2. Introduction • Language and sentence • Let  be a set of characters. A language over  is a set of strings of characters drawn from . • Alphabet = English charactersLanguage = English sentences • Alphabet = ASCIILanguage = C programs • Each string over  is a sentence • A language is a set of sentences • For Java programming language, a Java program is a sentence • Java programming language consists of all legal Java programs • Infinite ? PITT CS 1621

  3. The Compilation Process Source Program IF (a<b) THEN c=1*d; Lexical Analyzer IF ( ID “a” < ID “b” THEN ID “c” = CONST “1” * ID “d” Token Sequence a Syntax Analyzer cond_expr < b Syntax Tree IF_stmt lhs c list 1 assign_stmt rhs Semantic Analyzer * d GE a, b, L1 MUlT 1, d, c L1: 3-Address Code Code Optimizer GE a, b, L1 MOV d, c L1: loadi R1,a cmpi R1,b jge L1 loadi R1,d storei R1,c L1: Optimized 3-Addr. Code Code Generation Assembly Code Pitt CS 1621

  4. Describing a Programming Language • Describe programming languages • Not easy to provide a concise and precise description • Three phases • Describing tokens • A category of lowest level syntactic unit over alphabet e.g. • Describing syntax • The structure of program structures, expression, statements, etc. • Describing semantics • The meaning of the program, expression, statement, etc. e.g. only then-branch is executed if cond_expr is evaluated to be true cond_expr IF ID “a” THEN e IF_stmt stmt PITT CS 1621

  5. Regular Expression • Definition: • The regular expressions over  are the smallest set of expressions including •  • `c` where c   • A+B where A, B are RE over  • AB where A, B are RE over  • A* where A is a RE over  • Each RE corresponds to a regular language • We use them interchangeably PITT CS 1621

  6. Examples • Keywords: “else” or “if” or “while” or … • `else` + `if` + `while` + … • `else` abbreviates `e` `l` `s` `e` • keywords = { `else`, `if`, `then`, `while`, … } • Integer: • digit = `0`+`1`+`2`+`3`+`4`+`5`+`6`+`7`+`8`+`9` • integer = digit digit* • is `000` an integer? PITT CS 1621

  7. Examples Around • Phones number: consider (412) 624-0000 • ∑ = digit { -, (, ) } • area = digit3 • exchange = digit3 • phone = digit4 • phone_number = '(' area ')' exchange '-' phone • Email address: student@cs.pitt.edu • ∑ = letter { . , @ } • name = letter+ • address = name`@`name`.`name` Pitt CS 2210

  8. More Examples and Practice • RE used in languages • By itself is a string, semantically interpreted as a RE • RE in perl: if ($str =~ /(\d+)/ … • RE in C#: Match m = Regex.Match("abracadabra", "(a|b|r)+"); • Defined similarly • Regular expression used in C# and many other PLs PITT CS 1621

  9. Practice: • “^abc\d+\w{3}5?\S$” Meaning: • Give a RE to describe email addresses legal ones: abc.abc@ab.c.ab.d.edu Try to write a RE that have matching # of { and } {}, {{}}, {{{}}}, … PITT CS 1621

  10. Practice: • “^abc\d+\w{3}5?\S$” Meaning: • Give a RE to describe email addresses legal ones: abc.abc@ab.c.ab.d.edu Try to write a RE that have matching # of { and } {}, {{}}, {{{}}}, … Mission impossible !!! PITT CS 1621

  11. Describing Language Syntax • Context Free Language to describe language syntax • RE is not powerful enough • Using BNF (Backus-Naur Form, 1959) • Invented to describe ALGOL 58 • BNF fundamentals • Two types of symbols • Non-terminal symbols: BNF abstraction • Terminal symbols: tokens • A set of grammar rules • LHS → RHS PITT CS 1621

  12. BNF Rules • A grammar is a finite nonempty set of rules • A rule has one left hard symbol (LHS) and can have more than one right hand symbols <program>  <stmts> <stmts>  <stmt> | <stmt> ; <stmts> <stmt>  <var> = <expr> <var>  a | b | c | d <expr>  <term> + <term> | <term> - <term> <term>  <var> | const PITT CS 1621

  13. From Rules to Sentences • Derivation • A sentence can be generated by a repeat application of rules, starting from the start symbol • Start symbol is always a non-terminal symbol <program> => <stmts> => <stmt> => <var> = <expr> => a =<expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const PITT CS 1621

  14. Derivation • Every string of symbols in the derivation is a sentential form • A sentence is a sentential form that has only terminal symbols • A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded • A derivation may be neither leftmost nor rightmost PITT CS 1621

  15. Parse Tree • A hierarchical representation of a derivation <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> const b PITT CS 1621

  16. Examples • Describing • {}, {{}}, {{{}}}, …. • A programming language with infinite sentences b, ab, aab, aaab, aaaab, …, … • Use recursive definition PITT CS 1621

  17. Ambiguous Grammar <expr>  <expr> <op> <expr> | const <op>  / | - <expr> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> <expr> <op> <expr> const - const / const const - const / const PITT CS 1621

  18. Ambiguity • A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees • We can rewrite the grammar to remove the ambiguity <expr>  <expr> - <term> | <term> <term>  <term> / const| const • Ambiguity is related to the grammar not the language <expr> <expr> - <term> <term> <term> / const const const PITT CS 1621

  19. Removing Ambiguity • Introducing associativity and precedence • ambiguous: <expr> -> <expr> + <expr> | const • unambiguous: <expr> -> <expr> + <term> | <term> <term>  <term> / const| const PITT CS 1621

  20. Extended BNF • Optional parts are placed in brackets [ ] <proc_call> -> ident [(<expr_list>)] • Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term>(+|-) const • Repetitions (0 or more) are placed inside braces { } <ident> → letter {letter|digit} PITT CS 1621

  21. BNF and EBNF • BNF <expr>  <expr> + <term> | <expr> - <term> | <term> <term>  <term> * <factor> | <term> / <factor> | <factor> • EBNF <expr>  <term> {(+ | -) <term>} <term>  <factor> {(* | /) <factor>} PITT CS 1621

  22. From Syntax to Semantic • Context free grammar is powerful • Why ? e.g. variable definition before use <program>  <def> <use> <def>  int x; | int y; <use>  x = 1; | y = 1; or <program>  <def> <use> <def>  int x; | int y; int x; <use>  int x; x = 1; int y; <use>  int y; y = 1; Context free grammar is not powerful enough !!! C programming language is not context free !!! PITT CS 1621

  23. Compilation int $g=0; void main ())) { printf(“hello, $$$!\n”); } any input g=0; void main ())) { printf(“hello, world!\n”); }} pass lexical analysis pass syntax analysis g = 0; void main () { printf(“hello, world!\n”); } C Programming Language pass semantic analysis int g=0; void main () { printf(“hello, world!\n”); } Legitimate C program PITT CS 1621

  24. An Incorrect Language X is Bigger Language X1 = { all strings } = { ` int 变量=0; void main() {f=0; printf “hello, world,\n”;} `, ` int g =0; void main()))) {f=0; printf “hello, world,\n”;} `, `int g =0; void main() { f=0; printf “hello, world,\n”;} `, `int f =0; void main() { f=0; printf “hello, world,\n”;} `, …. } Language X2 = { token+ } = { ` int g =0; void main()))) { f=0; printf “hello, world,\n”;} `, `int g =0; void main() { f=0; printf “hello, world,\n”;} `, `int f =0; void main() { f=0; printf “hello, world,\n”;} `, …. } PITT CS 1621

  25. An Incorrect Language X is Bigger Language X3 = { generated from C-BNF } = { ` int g =0; void main() { f=0; printf “hello, world,\n”;} `, `int f =0; void main() { f=0; printf “hello, world,\n”;} `, …. } Language C = { C programs } = { `int f =0; void main() { f=0; printf “hello, world,\n”;} `, …. } PITT CS 1621

  26. Describing Language Semantics • Describing static semantics • Attribute grammar • Describing dynamic semantics • Operational semantics • Axiomatic semantics • Denotational semantics PITT CS 1621

  27. Attribute Grammars • Additions to CFGs to carry some semantic info along parse trees • CFG: context free grammar • The one we used in describing the syntax structure • A set of rules, two type of symbols • Values: • Static semantics specification • Compiler design (static semantics checking) PITT CS 1621

  28. Example E  T + E1 {E.val = T.val + E1.val} E  T {E.val = T.val} T  int * T1 {T.val = int.val * T1.val} T  int {T.val = int.val} Context Free Grammar Semantic actions attributes PITT CS 1621

  29. Example: T – synthesized attribute “type” L – has inherited attribute “in” DTL { L.in = T.type } T int { T.type = integer } T real { T.type = real } L L1, id { L1.in = L.in, addtype(id.entry, L,in) } L id { addtype(id.entry, L.in)} dashed edges show dependencies D L1 L1.in = T.type T.type = int T L2.in = L1.in int L2 , id addtype(id.entry, L1.in) addtype(id.entry, L2.in) L3 , id L.in = L.in addtype(id.entry, L3.in) id Pitt CS 1621

  30. Definition • Attribute grammar • Attributes: • For each grammar symbol x there is a set A(x) of attribute values • Semantic actions: • Each rule has a set of functions that define certain attributes of the nonterminals in the rule • Two types of attributes • Synthesized attributes • Inherited attributes • Computation on parse tree --- how ? PITT CS 1621

  31. P P c1 S1 c2 S2 c3 S3 c4 S4 Two Types of Attributes Synthesized attributes: values are computed from ones of the children nodes Synthesized of P = f(c1, c2, c3, c4) Inherited attributes: values are computed from attributes of the siblings and parent of the node Inherited of S4= f(P, S1, S2, S3) • semantic rules create dependencies between attributes represent the dependencies as a graph • from the graph, derive evaluation order for semantic rules Pitt CS 2210

  32. A A D E F D E F Example • Terminal symbols – have synthesized attributes only • Start symbol – is assumed not to have any inherited attributes • Synthesized/Inherited attributes are naturally computed bottom-up/top-down, respectivly • In practice, we usually use only one evaluation order • Converting attributes accordingly b is synthesized attribute of A ADEF ci’s attributes of D, E, F b is synthesized attribute of D ADEF ci’s attributes of A, E, F Pitt CS 2210

  33. Dynamic Semantics • I: Operational semantics • Describe the meaning of a program by executing its statements on a machine, either simulated or actual. The change in the state of the machine (memory, registers, etc.) defines the meaning of the statement • Example C statement: for(expr1; expr2; expr3) S Operational semantics: expr1; Labe_1: if expr2 ==0 got Label_2 S expr3; goto Label_1 Label_2: PITT CS 1621

  34. Discussion • Describing operational semantics requires a real or virtual machine • Interpreting the operation using instructions from a computer • Machine-dependent • May not be easy to understand • Interpreting using instructions from an idealized computer • Definition of lower level computer required • Translator required • Evaluation • Good if used informally • Extremely complex if used formally (e.g., VDL), it was used for describing semantics of PL/I. PITT CS 1621

  35. II: Axiomatic Semantics • Originally • Based on formal logic (predicate calculus) • For formal program verification • Form • Assertions • Pre-, post- conditions: {P} statement {Q} {b > 0} a = b + 1 {a > 1} • Axioms • Inference rules how to interpreting … PITT CS 1621

  36. Pre- and Post- Condition • Let us look at a simple axiom for assignment “X=E” • From post-condition to find the weakest pre-condition P = Q X→E replacing the appearance of X in Q using E example: {?} a = b/2-1 {a<10} • A more complicated example • {?} • a=3; • if (a>2) a=b+1; else • a=a+1; • {a>10} PITT CS 1621

  37. More axioms • An axiom for assignment statements (x = E): {Qx->E} x = E {Q} • The Rule of Consequence: • An inference rule for sequences {P1} S1 {P2} {P2} S2 {P3} • Loop? • Function call? • Input? PITT CS 1621

  38. Evaluation of Axiomatic Semantics • Developing axioms or inference rules for all of the statements in a language is difficult • Good for correctness proofs, for reasoning about programs • Limited power for language designers or compiler writers PITT CS 1621

  39. III: Denotational Semantics • Based on recursive function theory • The most abstract semantics description method • Originally developed by Scott and Strachey (1970) PITT CS 1621

  40. Building Denotational Specification • The state of a program is the values of all its current variables s = {<i1, v1>, <i2, v2>, …, <in, vn>} • Let VARMAP be a function that, when given a variable name and a state, returns the current value of the variable VARMAP(ij, s) = vj PITT CS 1621

  41. Building Denotational Specification • The process of building a denotational specification for a language • Define a mathematical object for each language entity • Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects • The meaning of language constructs are defined by only the values of the program's variables • In denotational semantics, the state changes are defined by rigorous mathematical functions • In operational semantics, the state changes are defined by coded algorithms PITT CS 1621

  42. Example Me(<expr>, s) = case <expr> of <dec_num> => Mdec(<dec_num>, s) <var> => if VARMAP(<var>, s) == undef then error else VARMAP(<var>, s) <binary_expr> => if (Me(<binary_expr>.<left_expr>, s) == undef OR Me(<binary_expr>.<right_expr>, s)== undef) then error else if (<binary_expr>.<operator> == ‘+’ then Me(<binary_expr>.<left_expr>, s) + Me(<binary_expr>.<right_expr>, s) else Me(<binary_expr>.<left_expr>, s) * Me(<binary_expr>.<right_expr>, s) ... PITT CS 1621

  43. Evaluation • Can be used to prove the correctness of programs • Provides a rigorous way to think about programs • Can be an aid to language design • Has been used in compiler generation systems • Because of its complexity, they are of little use to language users PITT CS 1621

  44. Summary • Describing programming languages • Describing tokens • Using RE • Describing syntax • Using BNF/CFG • Describing semantics • Static semantics • Using attributed grammar • Dynamic semantics • No widely accepted approaches • Three possible ways: operational, axiomatic, denotational PITT CS 1621

More Related