1 / 45

Syntax

Syntax. Juan Carlos Guzmán CS 3123 Programming Languages Concepts Southern Polytechnic State University. What does your DOS computer do when …?. > copy a.txt b.txt > copy a.txt a.txt > del *.* > del *01.* > type a.txt > null: > type a.txt > nul:.

moriah
Download Presentation

Syntax

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Syntax Juan Carlos Guzmán CS 3123 Programming Languages Concepts Southern Polytechnic State University

  2. What does your DOS computer do when …? • > copy a.txt b.txt • > copy a.txt a.txt • > del *.* • > del *01.* • > type a.txt > null: • > type a.txt > nul:

  3. How do we know the meaning of our commands?

  4. Semiotic • Synthesized from Merriam-Webster (m-w.com) • a general philosophical theory of signs and symbols that deals especially with their function in both artificially constructed and natural languages and comprises: • syntactics • the formal relations between signs or expressions in abstraction from their signification and their interpreters • semantics • the relations between signs and what they refer to • pragmatics • the relation between signs or linguistic expressions and their users

  5. Syntax • Two levels: • The language level, properly known as parsing • The lexeme level, known as lexing • More information about this topic can be found in • Aho, Sethi, Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1988. (on reserve, The Dragon book)

  6. Lexing • Specification of the lexemes of the language • A class of lexemes is known as a token • Tokens are specified in regular expressions: • letter, empty string • concatenation • choice • closure • Many convenient extensions • Recognized by Finite Automata • Limited in Power: cannot count, cannot recognize anbn

  7. Sample Regular Expressions • digit::= (0 | 1 | 2 | 3 | 4 | 5 | 6 |7 | 8 | 9) • ldigit::= (1 | 2 | 3 | 4 | 5 | 6 |7 | 8 | 9) • natural::= ldigit digit* • integer::= (+ | - | ) (natural | 0) • How about floating points? • W/o exponents • add the exponents

  8. Parsing • Specification of the language structure • The parser • recognizes the phrase, and • reconstructs its structure (parse tree)

  9. Context-Free Grammars • Generate Context-Free Languages • Allow recursion • Are specified as G=(N,T,P,S) where • N is the set of “non-terminals”, or variables • T is the alphabet • P the “production set” • S the starting symbol for every phrase

  10. CFG (Example) • G1 = ({S,A,B}, {a,b}, P, S) where P = {SASB, SBSA, S , A a, B b} • G2 = ({E}, {a,+,*,(,)}, P, E) where P = {EE+E, EE*E, Ea, E (E)}

  11. Grammars (conventions) • The empty string:  • First uppercase letters of the alphabet (A, B, C, …) => Non-terminal • First lowercase letters of the alphabet (a, b, c, …), or numbers (1, 2, …) => Terminal • First lowercase greek letters (, , ,…), => string of terminals and non-terminals • Last lowercase letters of the alphabet (t, u, v,…) => string of terminals

  12. Derivation • How do we generate phrases in the language? • By using a derivation: A =>  iff A  P • E => E+E => E+E*E => a+E*E => a+E*a => a+a*a

  13. The Language Generated • The language generated by the grammar is composed of all strings of terminals that can be derived from S by applying productions rules one or more times • Anything derived from S is called a sentential form

  14. Derivations • Leftmost derivation: the leftmost non-terminal is always reduced: E => E*E => E+E*E => a+E*E => a+a*E=> a+a*a • Rightmost derivation: the rightmost non-terminal is always reduced: E => E+E => E+E*E => E+E*a => E+a*a => a+a*a

  15. E E E + E E * E a a E * E E + E a a a a Parse Tree • A structured sequence of derivations • Visually appealing • From previous example:

  16. Ambiguous Grammar • Two different parse trees for a single phrase • Just one phrase with two trees is proof of ambiguity • Not ambiguous? All phrases must have only one parse tree! • An ambiguous grammar is quite different from an inherently ambiguous language

  17. Grammars vs. Languages • A language is a set • A grammar is a medium by which the set can be formally specified • Many grammars specify the same set

  18. An Expression Grammar • The grammar for expressions presented before was ambiguous • Non-ambiguous, with correct precedence (relative priority given to + and *): EE + T | T TT * F | F Fa | ( E ) E E + T T T * F a F F a a

  19. Parsing Styles • Top-down: to derive w from S, start from S, derive until w is obtained • Bottom-up: to derive w from S, try doing ‘reverse derivations’ from w until S is obtained

  20. Parsing Styles • Top-down: LL(k) • Easy to implement and understand • hand-coded • table-driven • Limited use, many problems • Bottom-up: LR(k) • More difficult to understand • table driven • A nice trade-off between complexity and generality

  21. An Expression Grammar G = ({E,T,F},{a,+,*,(,)},P,E) where P = {ET+E | T, TF*T | F, Fa | (E) } Does a+a*a in L(G)? E T + E F T * T a F a F a

  22. A Grammar for a Small Language programbeginstmt_listend stmt_liststmt stmt;stmt_list stmtvar=expression varABC expressionvar+var var-var var

  23. Predictive Parsing • How many characters of look-ahead are needed to predict the next production to take? • Is this a finite number? • Is it 1?

  24. Another Expression Grammar G’ = ({E,E’,T,T’,F},{a,+,*,(,)},P,E) where P = {ETE’, E’+TE’ | , TFT’, T’*FT’ | , Fa | (E) } Does a+a*a in L(G’)? E T E’ F T’ + T E’  a  F T’ a * F T’ a 

  25. LL(1) Parsing Table

  26. LL(1) Algorithm input stack Parse(a1 … an, X1 … Xm) { if (a1=$) & (X1=$) accept else if X1 is a terminal and (X1=a1) Parse(a2 … an, X2 … Xm) // match else if Table[X1,a1] = X1Y1 … Yk Parse(a1 … an, Y1 … YkX2 … Xm) / derive else fail } • Call initially with Parse(w$,S$), where w is the phrase to parse and S is the starting symbol of the grammar ai is a terminal Xj Yk are terminals or nonterminals

  27. INPUT a + a * a $ a + a * a $ a + a * a $ a + a * a $ + a * a $ + a * a $ + a * a $ a * a $ a * a $ a * a $ * a $ * a $ a $ a $ $ $ $ STACK E $ T E’ $ F T’ E’ $ a T’ E’ $ T’ E’ $ E’ $ + T E’ $ T E’ $ F T’ E’ $ a T’ E’ $ T’ E’ $ * F T’ E’ $ F T’ E’ $ a T’ E’ $ T’ E’ $ E’ $ $ Parser Operation on a+a*a # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 OPERATION derive derive derive match derive derive match derive derive match derive match derive match derive derive accept Sentential Form E $ T E’ $ F T’ E’ $ a T’ E’ $ a T’ E’ $ a E’ $ a + T E’ $ a + T E’ $ a + F T’ E’ $ a + a T’ E’ $ a + a T’ E’ $ a + a * F T’ E’ $ a + a * F T’ E’ $ a + a * a T’ E’ $ a + a * a T’ E’ $ a + a * a E’ $ a + a * a $

  28. Note how the leftmost derivation of a+a*a is done Sentential Form E $ T E’ $ F T’ E’ $ a T’ E’ $ a T’ E’ $ a E’ $ a + T E’ $ a + T E’ $ a + F T’ E’ $ a + a T’ E’ $ a + a T’ E’ $ a + a * F T’ E’ $ a + a * F T’ E’ $ a + a * a T’ E’ $ a + a * a T’ E’ $ a + a * a E’ $ a + a * a $ E T E’ F T’ + T E’  a  F T’ a * F T’ a 

  29. What’s the Table Lookup • Note that the predictive nature of the parser guarantees the uniqueness of the entry for Table[A,b] (or no entry at all) • When attempting to derive nonterminal A, the look-ahead b must give the correct rule to apply • This b can be • the initial character of the derivation of A, i.e., A *b, • or, it can be the initial character of the derivation of what follows A! (A *)

  30. First Sets • first() is the set of one-character prefixes of strings of terminals that can be derived from  • If the empty string can be derived from , then it will also be in the set • if  * aw then a  first() • if  *  then   first()

  31. First Sets (II) • first() ={} • first(a) ={a} • first(A) = first(1)  …  first(n) if A 1  P, …, A n  P • first(X) = first(X)first() where X is either terminal or nonterminal

  32. Bounded Concatenation • In computing first(X), our interest is to obtain one-character prefixes (or ) • Consider the operation at the char level •    = , where  is either  or a terminal • a   = a • Generalize it to work on sets • AB = {vw | vA, wB}, where A & B are sets

  33. Computation of First Sets

  34. Computation of First Sets

  35. Follow Sets • Follow(A) is the set of prefixes of strings of terminals that can follow any derivation of A in G • $ follow(S) • if(BA)P, then • first()follow(B) follow(A) • The definition of follow usually results in recursive set definitions. In order to solve them, you need to do several iterations on the equations •  never appears in any follow set • Note: I had promised a closed definition of follow, but it will be unnecessarily complex. JCG.

  36. Computation of Follow Sets

  37. Computation of Follow Sets

  38. How to Fill In the Table • For each production (A)P let X=first()follow(A) then for all xX B Table[A,x] • After processing all productions, each cell of the table must have, at most, one production • if not, your grammar is not LL(1) (nice try!)

  39. First & Follow Sets

  40. Yet Another Expression Grammar (it’s in the book!) G = ({E,T,F},{a,+,*,(,)},P,E) where P = { EE+T,  ET, TT*F,  TF, F(E),  Fa} Does a+a*a in L(G)? E E + T * T T F a F F a a

  41. LR(1) Parsing Table Sn:shift to staten Rn:reduce according to productionn

  42. LR(1) Algorithm stack input Parse(S0X1S1X2S2 … XrSr … XmSm,a1 … an) { if Action[Sm,a1] == Shift S Parse(S0X1S1X2S2 … XmSma1S,a2 … an) else if Action[Sm,a1] == Reduce AXr+1…Xm and GOTO[Sr,A] == S Parse(S0X1S1X2S2 … XrS,a1 … an) else if Action[Sm,a1] == Accept accept else if Action[Sm,a1] == Error error } • Call initially with Parse(S0,w$), where w is the phrase to parse and S0 is the initial state of the table ai is a terminal Xj Yk are terminals or nonterminals Si is a “state”

  43. STACK 0 0 a 5 0 F 3 0 F 3 0 E 1 0 E 1 + 6 0 E 1 + 6 a 5 0 E 1 + 6 F 3 0 E 1 + 6 T 9 0 E 1 + 6 T 9 * 7 a 5 0 E 1 + 6 T 9 * 7 a 5 0 E 1 + 6 T 9 * 7 F 10 0 E 1 + 6 T 9 0 E 1 Parser Operation on a+a*a # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 INPUT a + a * a $ + a * a $ + a * a $ + a * a $ + a * a $ a * a $ * a $ * a $ * a $ a $ $ $ $ $ OPERATION S 5 R 6, G[0,F] R 4, G[0,T] R 2, G[0,E] R 6 S 5 R 6, G[6,F] R 4, G[6,T] S 7 S 5 R 6, G[7,F] R 3, G[7,T] R 1, G[0,E] accept Sentential Form a + a * a $ a + a * a $ F + a * a $ T + a * a $ E + a * a $ E + a * a $ E + a * a $ E + F * a $ E + T * a $ E + T * a $ E + T * a $ E + T * F $ E + T $ E $

  44. Note how the rightmost derivation of a+a*a is done Sentential Form E $ E + T $ E + T * F $ E + T * a $ E + T * a $ E + T * a $ E + F * a $ E + a * a $ E + a * a $ E + a * a $ T + a * a $ F + a * a $ a + a * a $ a + a * a $ E E + T * T T F a F F a a

More Related