460 likes | 780 Views
The Chomsky Hierarchy. Sentences The sentence as a string of words E.g I saw the lady with the binoculars string = a b c d e b f. The relations of parts of a string to each other may be different I saw the lady with the binoculars is stucturally ambiguous Who has the binoculars?.
E N D
SentencesThe sentence as a string of wordsE.g I saw the lady with the binoculars string = a b c d e b f
The relations of parts of a string to each other may be different I saw the lady with the binoculars is stucturally ambiguous Who has the binoculars?
[I] saw the lady [ with the binoculars]= [a] b c d [e b f]I saw[ the lady with the binoculars]= a b [c d e b f]
How can we represent the difference? By assigning them different structures. We can represent structures with 'trees'. I read the book
a. I saw the lady with the binoculars S NPVPVNPNP PP I saw the ladywith the binocularsI saw [the lady with the binoculars]
b. I saw the lady with the binoculars S NPVPVP PP Isaw the ladywith the binocularsI[ saw the lady ] with the binoculars
birdsfly S NP VP N V birdsfly S → NP VP NP → N VP → V Syntactic rules
S NP VP birdsfly a b ab = string
S A B a b ab S → A B A → a B → b
Rules Assumption: natural language grammars are a rule-based systems What kind of grammars describe natural language phenomena? What are the formal properties of grammatical rules?
Chomsky (1957) Syntactic Struc-tures. The Hague: Mouton Chomsky, N. and G.A. Miller (1958) Finite-state languages Information and Control 1, 99-112 Chomsky (1959) On certain formal properties of languages. Information and Control 2, 137-167
Rules in Linguistics1.PHONOLOGY /s/ → [θ] V ___VRewrite /s/ as [θ] when /s/ occurs in context V ____ VWith:V = auxiliary nodes, θ = terminal nodes
Rules in Linguistics2.SYNTAXS → NP VPVP → VNP → NRewrite S as NP VP in any contextWith:S, NP, VP= auxiliary nodesV, N = terminal node
PHONOLOGY (sound system) Maltese – Word-final devoicing Orthography Pronunciation (spelling) (sound) Sabetsab [sa-bet] [sap] Ħobżaħobż [hob-za] [hops] Vjaġġivjaġġ [vjağ-ği] [vjačč] voiced [+vd] voiceless [-vd] [b, z, ğ] [p, s, č] [+vd] → [-vd] /____ # (for # = end of word)
MORPHOLOGY (word formation) Maltese – Progressive assimilation in 3fsg imprefective (present) Marker for verb in 3rd person feminine singular imperfective t- (3fsgimpf = she) e.g. she breaks = t-kisser I break= n-kisser t-kisser t-ressaq 3fsg-break 3fsg-move she breaks she moves s-sakkar d-dur 3fsg-lock 3fsg-turn she locks she turns *t-sakkar * t-dur t → s,d,etc. /____ [s,d,etc. | [+cor] μ [3fsg] (with μ = morpheme, C = consonant, cor = coronal
SYNTAX (phrase/sentence formation) sentence: The boy kissed the girl Subject predicate noun phrase verb phrase art + noun verb + noun phrase S → NP VP VP → V NP NP → ART N
SEMANTICS (meaning) The lion attacks the hunter attack (a, b) a λy [attack (y, b)] λzλy [attack (y, z)] b (with a = the lion, b = the hunter)
Chomsky Hierarchy 0. Type 0 (recursively enumerable) languages Only restrictionon rules: left-hand side cannot be the empty string (* Ø …….) 1. Context-Sensitive languages - Context-Sensitive (CS) rules 2. Context-Free languages - Context-Free (CF) rules 3. Regular languages - Non-Context-Free (CF) rules 0 ⊇ 1⊇ 2 ⊇ 3 a⊇b meaning a properly includes b (aisasupersetofb), i.e. b is a proper subset of a or b is in a
Generative power 0. Type 0 (recursively enumerable) languages • only restriction on rules: left-hand side cannot be the empty string (* Ø …….) - is the most powerful system 3. Type 3(regularlanguage) - is the least powerful
Superset/subset relation S1 S2 a c b d f g a b S1 is a subset of S2 ; S2 is a subset of S1
Rule Type – 3 Name: Regular Example:Finite State Automata (Markov-process Grammar) Rule type: a) right-linear AxB or A x with: A, B = auxiliary nodes and x = terminal node b) or left-linear ABx or A x Generates: ambn with m,n 1 Cannot guarantee that there are as many a’s as b’s; no embedding
A regular grammar for natural language sentences S →the A A → cat B A → mouse B A → duck B B → bites C B → sees C B → eats C C → the D D → boy D → girl D → monkey the cat bites the boy the mouse eats the monkey the duck sees the girl
Regular grammars Grammar 1: Grammar 2: A → a A → a A → a B A → B a B → b A B → A b Grammar 3: Grammar 4: A → a A → a A → a B A → B a B → b B → b B → b A B → A b Grammar 5: Grammar 6: S → a AA → A a S → b B A → B a A → a S B → b B → b b S B → A b S → A → a
Grammars: non-regular Grammar 6: Grammar 7: S → A B A → a S → b B A → B a A → a S B → b B → b b S B → b A S →
Finite-State Automaton article noun NP NP1 NP2 adjective
NP article NP1 adjective NP1 noun NP2 NP → article NP1 NP1 →adjective NP1 NP1 → noun NP2
A parse tree S root node NP VP non- terminal N V NP nodes DET N terminal nodes
Rule Type – 2 Name: Context Free Example: Phrase Structure Grammars/ Push-Down Automata Rule type: A with: A = auxiliary node = any number of terminal or auxiliary nodes Recursiveness(centre embedding) allowed: AA
CF Grammar A Context Free grammar consists of: a) a finite terminal vocabulary VT b) a finite auxiliary vocabulary VA c) an axiom S VA • a finite number of context free rules of form A → γ, where A VA and γ {VA VT}* In natural language syntax S is interpreted as the start symbol for sentence, as in S → NP VP
CF Grammars The following languages cannot be generated by a regular grammar Language 1: Language 2: anbn mirror image ababaaba aabbabbaabba Context-Free rules: A → a Aa A → a b A→ b A b
Natural language Is English regular or CF? If centre embedding is required, then it cannot be regular Centre Embedding: 1. [The cat] [likes tuna fish] a b 2. The cat the dog chased likes tuna fish a a b b 3. The cat the dog the rat bit chased likes tuna fish a a a bb b 4. The cat the dog the rat the elephant admired bit chased likes tuna fish a a a a b b b b ab aabb aaabbb aaaabbbb
Centre embedding S NP VP the likes cat tuna a b = ab
S NP VP likes NP S tuna the b cat NP VP a thechased dogb a = aabb
S NP VP likes NP Stuna the b cat NPVP a chased NPSb the dog NPVP athebit ratb a = aaabbb
Natural language Is English regular or CF? If centre embedding is required, then it cannot be regular
Centre Embedding 1. [The cat] [likes tuna fish] a b = ab 2. [The cat] [the dog] [chased] [likes tuna fish] a abb = aabb
[The cat] [likes tuna fish] a b 2. [The cat] [the dog] [chased] [likes ...] aa bb
3. [The cat] [the dog] [the rat] [bit] [chased] [likes ...] a a abbb • [The cat][the dog][the rat][the elephant][admired][bit][chased][likes ....] = aa a a b b bb aaabbb aaaabbbb
Natural language 2 More Centre Embedding: 1. If S1, then S2 a a 2. Either S3, or S4 b b 3. The man who said S5 is arriving today 4. The man who said S6 is arriving the day after Sentence with embedding: If either the man who said S5 is arriving today or the man who said S5 is arriving tomorrow, then the man who said S6 is arriving the day after abba = abba
Natural language 2 More Centre Embedding: 1. If S1, then S2 a a 2. Either S3, or S4 b b Sentence with embedding: If either the man is arriving today or the woman is arriving tomorrow, then the child is arriving the day after. a = [if b = [either the man is arriving today] b = [or the woman is arriving tomorrow]] a = [then the child is arriving the day after] = abba
CS languages The following languages cannot be generated by a CF grammar (by pumping lemma): anbmcndm Swiss German: A string of dative nouns (e.g. aa), followed by a string of accusative nouns (e.g. bbb), followed by a string of dative-taking verbs (cc), followed by a string of accusative-taking verbs (ddd) = aabbbccddd = anbmcndm
Swiss German: Jan sait das (Jan says that) … merem Hans esHuushälfedaastriiche we Hans/DAT the house/ACC helpedpaint we helped Hans paint the house abcd NPdatNPdatNPaccNPaccVdatVdatVaccVacc a a b b c c d d