1 / 31

Grammars, constituency and order

Grammars, constituency and order. A gramma r describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment of English might say that a legal sentence consists of a noun phrase (subject), followed by a verb phrase (predicate).

cody
Download Presentation

Grammars, constituency and order

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grammars, constituency and order • A grammar describes the legal strings of a language in terms of constituency and order. • For example, a grammar for a fragment of English might say that a legal sentence consists of • a noun phrase (subject), • followed by a verb phrase (predicate). • This rule is commonly written as • S → NP VP

  2. Constituents of constituents • The constituents of constituents may be described by other rules. • They would refine, for example, the initial decomposition • [the dog] [chased a cat] • into a complete decomposition • [[the] [dog]] [[chased] [[a] [cat]]] • according to the following rules:

  3. Grammar rules for a fragment of English • S → NP VP • NP → Det N • VP → V NP • N → dog • N → cat • Det → the • Det → a • V → chased

  4. Parse trees (derivation trees) • Hierarchical decomposition of sentences are more commonly expressed by special trees, known as parse trees or derivation trees. • For our sample sentence, we would have the parse tree below

  5. Parse tree for an English sentence S / \ NP VP / \ / \ Det N V NP | | | / \ the dog chased Det N | | a cat

  6. Sentences generated by grammars • The grammar with the rules above would also allow, or generate, sentences like • a dog chased a cat • the dog chased a dog • a cat chased the dog since parse trees could be constructed for these sentences.

  7. Context-free grammars (CFGs) • In the example above, the alphabet S consisted of the set of English words. • A grammar also needs to specify symbols aside from S, and rules, so more precisely …

  8. CFGs defined • … a context-free grammar (CFG) consists of • a set T of terminal symbols (analogous to S) • a set V of other variables (or nonterminals) • a start symbol S, which is a member of V • a collection P of rules (or productions), each with • a left-hand side (LHS) from V, and • a right-hand side (RHS) from (V U T)*

  9. Context freedom • The notion of context freedom means that any category may be expanded in accordance with the rules no matter where it appears. • So for example, the noun phrases that are legal subjects are the same as those that are legal objects (that is, are NPs in the V → NP rule).

  10. Rules for a CFG for L(0(1+2)*) • S → 0X • X → l • X → YX • Y → 1 • Y → 2

  11. CFGs for palindromes • A CFG for even-length palindromes over {0,1}: • S → l | 0S0 | 1S1 • A CFG for odd-length palindromes over {0,1}: • S → 0 | 1 | 0S0 | 1S1 • Here we use the common convention allowing several rules with the same LHS to be combined into one, with vertical bars separating the RHSs.

  12. Rules for other 1-variable CFGs • for all palindromes over {0,1}: • S → l | 0 | 1 | 0S0 | 1S1 • for nonempty sequences of balanced parentheses: • S → ( ) | ( S ) | SS • for {0n1n | n ≥ 0} • S → l | 0S1 • for { x e {0,1} | x has as many 0's as 1's} • S → l | 0S1 | 1S0 | SS

  13. Parse trees and grammars • A parse tree is legal for a CFG iff it satisfies each correspondence: • root ↔ start symbol • parent node ↔ LHS of a grammar rule • child node ↔ symbol from the RHS of a rule whose LHS is the parent node • leaf ↔ terminal symbol (or l) • Also, the ordering of children of a node must match the ordering of the RHS symbols in the corresponding rule.

  14. Partial parse (derivation) trees • It's convenient to allow representation of the progress of a parse by allowing leaves to be labeled by a nonterminal symbol (and perhaps ignoring the constraint on roots) • In any case, the left to right sequence of leaf labels (ignoring those labeled by l) is called the yield of the parse tree • so the yield is a string of terminals

  15. Notational conventions • Lower case letters are interpreted as for DFAs • those near the beginning of the alphabet represent terminals; those near the end of the alphabet represent strings • Capital letters represent nonterminals (variables) • Greek letters represent strings of variables and terminals • So a generic rule looks like A → g

  16. Derivations and rewrite rules • CFG rules are also rewrite rules. • Here the rule S → NP VP would allow rewriting of S as NP VP • We may define generation in terms of derivation from S by repeated rewriting • e.g., we get the legal derivation S => NP VP => Det N VP => the N VP => the dog VP => the dog V NP => the dog chased NP => the dog chased Det N => the dog chased a N => the dog chased a cat

  17. Leftmost and rightmost derivations • For every parse tree there are unique leftmost and rightmost derivations • The rightmost derivation corresponding to the parse tree above is • S => NP VP => NP V NP => NP V Det N => NP V Det cat => NP V a cat => NP chased a cat => Det N chased a cat => Det dog chased a cat => the dog chased a cat

  18. Leftmost and rightmost derivations • All but the simplest parse trees will have other associated derivations besides the leftmost and rightmost. • For every derivation there is a unique associated parse tree. • Def) Using the symbol =>* for the transitive closure of the => relation, then • a sentential form for G is a string a from V U T such that S =>* a

  19. Context-free languages (CFLs) • Fact: A CFG G with start symbol S licenses a parse tree for w iff S =>* w • Def) L(G) (the language generated by G) is {x | G generates x}, or equivalently {x | G’s start symbol derives x}, or {x ε T* | x is a sentential form for G}, • A language generated by a context-free grammar is called a context-free language

  20. Ambiguous grammars • Here’s a 1-variable CFG for a subset of algebraic expressions: • E → x | y | E+E | E*E | (E) • Note that this grammar allows multiple parse trees for some strings, like x+y*y. • A grammar with this property is said to be ambiguous.

  21. An unambiguous grammar for algebraic expressions • Rules for an unambiguous grammar for the above language are given below: • E → E + T | T • T → T * F | F • F → x | y | ( E )

  22. Inherent ambiguity • Ambiguity is common in natural languages. • But we don't want it in programming languages! • Often ambiguity can be removed. • i.e., a grammar can be replaced by an unambiguous one, as seen above • But there are languages for which all grammars are ambiguous. • These languages are said to be inherently ambiguous.

  23. Regular languages and CFLs • We’ve already seen examples of CFLs that aren’t regular languages • But it's fairly easy to show that all regular languages are context-free. • The languages {a}, {l}, and f have grammars with respective productions • S → a • S → l • [no productions]

  24. All regular languages are CFLs • Suppose L1 and L2 have respective start symbols S1 and S2. • Then we may get grammars with start symbol S for their union, for their concatenation, and for L1* by adding the respective productions • S → S1 | S2 • S → S1S2 • S → l | S1S2 • So all regular languages are CFLs

  25. Grammars for regular languages • Any regular language can be generated by a special type of CFG. • Def) A right-linear grammar is a CFG where the RHS of each rule has the form xB or x, • for x ε T* and B ε V • Fact: Right-linear grammars generate all and only regular languages

  26. Finding a grammar for a regular language • For a DFA M, consider the grammar G with • T = S, V = Q and S = q0 • a rule qi → ajqk for each aj move from qi to qk • a rule qi → aj for each aj move from qi to qk where qk ε F • An easy induction shows that d*(q,x) = p iff q =>* xp • and that d*(q,x) = p and p ε F iff q =>* x • So L(G) = L(M)

  27. DFAs for right-linear grammars • Conversely, let G be a right-linear grammar • If all strings x on RHSs have length 1, then the construction above can be reversed • and the proof above still holds • If not, then the construction can be modified by adding extra states as in Linz, pp. 91-2 • In either case a DFA can be obtained for L(G)

  28. Regular grammars • Left-linear grammars may be defined by analogy with right-linear grammars • every rule must have a RHS of the form Bx or x • Fact: Left-linear grammars generate all and only regular languages • A CFG is a regular grammar iff it is right-linear or left-linear • so a language has a regular grammar iff it is regular

  29. Backus-Naur form (BNF) • Grammars for programming languages generally use a variant of our CFG notation called BNF. • In BNF the symbol ::= is used instead of the rightward pointing arrow. • In BNF, terminal symbols may be given in bold face, or nonterminals may be delimited by angle brackets, e.g. • <identifier> ::= <letter> <digits>

  30. Common BNF conventions • The vertical bar convention • [ ] brackets • for optionality (0 or 1 times) • { } braces • for indefinite repetition (0 or more times) • ( ) parentheses • for removing ambiguity, e.g., (a|b)c vs. a | bc

  31. A sample grammar in BNF • <conditional> ::= • if <test> then <block> [ else <block> ] endif • <block> ::= begin [<statements>] end • <statements> ::= { <statement> } • <test> ::= <var> <op> <var> • <statement> ::= <var> = <var> • <var> ::= x | y • <op> ::= = | /=

More Related