250 likes | 743 Views
Languages and Grammars. MSU CSE 260. Outline. Introduction: E xample Phrase-Structure Grammars: Terminology, Definition, Derivation, Language of a Grammar, Examples Exercise 10.1 (1) Types of Phrase-Structure Grammars Derivation Trees: Example, Parsing Exercise 10.1 (2, 3)
E N D
Languages and Grammars MSU CSE 260
Outline • Introduction: Example • Phrase-Structure Grammars: Terminology, Definition, Derivation, Language of a Grammar, Examples • Exercise 10.1 (1) • Types of Phrase-Structure Grammars • Derivation Trees: Example,Parsing • Exercise 10.1 (2, 3) • Backus-Naur Form
Introduction • In the English language, the grammar determines whether a combination of words is a valid sentence. • Are the following valid sentences? • The largerabbithopsquickly. Yes • The frog writes neatly. Yes • Swims quickly mathematician. No • Grammars are concerned with the syntax (form) of a sentence, and NOT its semantics (or meaning.)
English Grammar • Sentence: noun phrase followed by verb phrase; • Noun phrase: articleadjectivenoun, or articlenoun; • Verb phrase: verbadverb, or verb; • Article: a, or the; • Adjective: large, or hungry; • Noun: rabbit, or mathematician, or frog; • Verb: eats, or hops, or writes, or swims; • Adverb: quickly, or wildly, or neatly;
Example • Sentence • Noun phraseverb phrase • Articleadjectivenounverb phrase • Articleadjectivenounverbadverb • the adjectivenounverbadverb • the largenounverbadverb • the largerabbitverbadverb • the largerabbithopsadverb • the largerabbithopsquickly
Grammars and Computation • Grammars are used as a model of computation. • Grammars are used to: • generate the words of a language, and • determine whether a word is in a language.
Phrase-Structure Grammars Terminology • Definitions. A vocabulary (or alphabet) V is a finite, nonempty set of elements called symbols. • A word (or sentence) over V is a string of finite length of elements of V. • The empty string (or null string,) denoted by , is the string containing no symbols. • The set of all words over V is denoted by V*. • A language over V is a subset of V*.
Phrase-Structure Grammars • A language can be specified by: • listing all the words in the language, or • giving a set of criteria satisfied by its words, or • using a grammar. • A grammar provides: • a set of symbols, and • a set of rules, called productions, for producing words by replacing strings by other strings: w0 w1.
Phrase-Structure GrammarDefinition A phrase-structure grammarG = (V, T, S, P) consists of: • a vocabularyV, • a subset T of V consisting of terminal elements, • a start symbolS from V, and • a set P of productions. The set N = V-T consists of nonterminal symbols. Every production in P must contain at least one nonterminal on its left side.
Phrase-structure Grammar Example • G = {V, T, S, P}, where • V = {a, b, A, B, S}, • T = {a, b}, • S is the start symbol, and • P = { S Aba, A BB, B ab, AB b}.
Phrase-Structure GrammarsDerivation • Definition. Let G = (V, T, S,P) be a phrase-structure grammar. Let w0 = lz0r and w1 = lz1r be strings over V. • If z0 z1 is a production of G, we say that: w1 is directly derivable from w0 (denoted: w0 w1.) • If w0, w1, …, wn are strings over V such that: w0 w1, w1 w2, …,wn-1 wn, we say that: wn is derivablefrom w0 (denoted: w0 * wn.) Note. * should be on top of . • The sequence of all steps used to obtain wn from w0 is called a derivation.
Example • In the previous example grammar, the production: B ab makes the string Aaba directly derivable from string ABa. • ABa Aaba • Also Aaba BBaba Bababa abababa • using: A BB, B ab, and B ab. • So: ABa * abababa • abababa is derivable fromABa.
Language of a Grammar • Definition. Let G = (V, T, S, P) be a phrase-structure grammar. The language generatedby G (or the languageof G), denoted by L(G), is the set of all strings of terminals that are derivable from the start symbol S. L(G) = {wT* | S * w}.
Example • Let G = {V, T, S, P} be the grammar where: • V = {S, 0, 1}, • T = {0, 1}, • P = { S 11S, S 0}. • What is L(G)? • At any stage of the derivation we can either: • add two 1s at the end of the string, or • terminate the derivation by adding a 0 at the end of the string. • L(G)={0, 110, 11110, 1111110, …} = Set of all strings that begin with an even number of 1s and end with 0.
Types of Grammars • A type 0 (phrase-structure) grammar has no restrictions on its productions. • A type 1(or context-sensitive) grammar has productions only of forms: • w1 w2 with length of w2 length of w1, or • w1 . • A type 2 (or context-free) grammar has productions only of the form A w2, where A is a single nonterminal symbol.
Types of Grammars – cont. • A type 3 (or regular) grammar has productions only of the form: • A aB, or A a, where • A and B are nonterminal symbols, and • a is a terminal symbol, or • S . • Note. • Every type 3 grammar is a type 2 grammar • Every type 2 grammar is a type 1 grammar • Every type 1 grammar is a type 0 grammar
Types of Grammars - Summary TypeRestrictions on productions w1w2 0 No restrictions 1 l(w1) l(w2), or w2= 2 w1=A where AN 3 w1=A, and w2=aB or w2=a, where AN, BN, aT, or w1=S and w2=
Derivation Trees • For type 2 (context-free) grammars: A derivation (or parse)tree, is an ordered rooted tree that represents a derivation in the language generated by a context-free grammar, where: • the root represents the starting symbol; • the internal vertices represent nonterminal symbols; • the leaves represent the terminal symbols; • for a production A w, the vertex representing A will have children vertices that represent each symbol in w.
Example • Derivation tree for: the hungry rabbit eats quickly sentence noun phrase verb phrase article adjective noun verb adverb thehungryrabbiteatsquickly
Parsing • To determine whether a string is in the language generated by a grammar, use: • Top-down parsing: • Begin with S and attempt to derive the word by successively applying productions, or • Bottom-up parsing: • Work backward: Begin by inspecting the word and apply productions backward.
Example • Let G = {V, T, S, P} be the grammar where: • V = {a, b, c, A, B, C, S}, T = {a, b, c}, • Productions:Determine whether cbab is in L(G)? S ABTop-down parsing: A Ca S AB B Ba S AB CaB B Cb S AB CaB cbaB B b S AB CaB cbaB cbab C cb Bottom-up parsing: C b Cab cbab Ab Cab cbab AB Ab Cab cbab S AB Ab Cab cbab
Backus-Naur Form • Used with type 2 (context-free) grammars; like for specification of programming languages: • Use ::= instead of • Enclose nonterminal symbols within < > • Group productions with same left side with symbol | • Example. • <signed integer> ::= <sign><integer> • <sign> ::= + | - • <integer> ::= <digit> | <digit><integer> • <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9