320 likes | 458 Views
LING/C SC/PSYC 438/538 Computational Linguistics. Sandiway Fong Lecture 10 : 9/27. Today’s Topics. Homework 2 review Recap: DCG system New topic: Finite State Automata (FSA). Question 1
E N D
LING/C SC/PSYC 438/538Computational Linguistics Sandiway Fong Lecture 10: 9/27
Today’s Topics • Homework 2 review • Recap: DCG system • New topic: Finite State Automata (FSA)
Question 1 Consider a language L3or5 = {111, 11111, 111111, 111111111, 1111111111, 111111111111, 111111111111111,...} each member of L3or5 is a string containing only 1s the number of 1s in each string is divisible by 3 or 5 Give a regular grammar that generates language L3or5 Homework 2 Review
Regular grammar: divisible by 3 side s --> [1], a. a --> [1], b. b --> [1]. b --> [1], c. c --> [1], d. d --> [1], b. Regular grammar: divisible by 5 side s --> [1], one. one --> [1], two. two --> [1], three. three --> [1], four. four --> [1]. four --> [1], five. five --> [1], six. six --> [1], seven. seven --> [1], eight. eight --> [1], four. Homework 2 Review take the disjunction of the two sub-grammars, i.e. merge them
Question 2: Language L = {a2nbn+1 | n ≥1} is also non-regular but can be generated by a regular grammar extended to allow left and right recursive rules Given a Prolog grammar satisfying these rules for L Legit rules: X -> aY X -> a X -> Ya where X, Y are non-terminals, a is some arbitirary terminal Homework 2 Review
L = {a2nbn+1 | n≥ 1} DCG: s --> [a], b. b --> [a], c. c --> [b], d. c --> s, [b]. d --> [b]. s a b a c b d b s a b a c s b Homework 2 Review generated by rules 1 + 2 + 3 + 5 generated by rules 1 + 2 + 4 (center-embedded recursive tree)
Question 3 (Optional 438) A right recursive regular grammar that generates a (rightmost) parse for the language {an| n>=2}: s(s(a,B)) --> [a], b(B). b(b(a,B)) --> [a], b(B). b(b(a))--> [a]. Example ?- s(Tree,[a,a,a],[]). Tree = s(a,b(a,b(a)))? A corresponding left recursive regular grammar will not halt in all cases when an input string is supplied Modify the right recursive grammar to produce a left recursive parse, e.g. ?- s(Tree,[a,a,a],[]). Tree = s(b(b(a),a),a) Can you do this for any right recursive regular grammar? e.g. the one for sheeptalk Homework 2 Review
Original: s(s(a,B)) --> [a], b(B). b(b(a,B)) --> [a], b(B). b(b(a))--> [a]. New: s(s(B,a)) --> [a], b(B). b(b(B,a)) --> [a], b(B). b(b(a))--> [a]. Homework 2 Review taking advantage of the fact that we know we’re generating only a’s example: in rule 1, we match a terminal a on the left side but place an a at the end for the tree
Extra Arguments: Recap • some uses for extra arguments on non-terminals in the grammar: • to generate a parse tree • recording the derivation history • feature agreement • implement counting
Implement Determiner-Noun Number Agreement Data: the man/men a man/*a men Modified grammar: np(np(D,N)) --> det(D,Number), common_noun(N,Number). det(det(the),_) --> [the]. det(det(a),sg) --> [a]. common_noun(n(ball),sg) --> [ball]. common_noun(n(man),sg) --> [man]. common_noun(n(men),pl) --> [men]. Extra Arguments: Recap np(np(D,N)) --> detsg(D), common_nounsg(N). np(np(D,N)) --> detpl(D), common_nounpl(N). detsg(det(a)) --> [a]. detsg(det(the)) --> [the]. detpl(det(the)) --> [the]. common_nounsg(n(man)) -->[man]. common_nounpl(n(men)) --> [men]. syntactic sugar could just encode feature values in nonterminal name
Class Exercise: Implement Case = {nom,acc} agreement system for the grammar Examples: the man hit me vs. *the man hit I Modified grammar: s(Y,Z)) --> np(Y,Case), vp(Z), { Case = nom }. np(np(Y),Case) --> pronoun(Y,Case). pronoun(i,nom) --> [i]. pronoun(we,nom) --> [we]. pronoun(me,acc) --> [me]. np(np(D,N),_) --> det(D,Num), common_noun(N,Num). vp(vp(Y,Z)) --> transitive(Y), np(Z,Case), { Case = acc }. Extra Arguments: Recap { ... Prolog code ... }
Class Exercise: Implement Case = {nom,acc} agreement system for the grammar Examples: the man hit me vs. *the man hit I Modified grammar: s(Y,Z)) --> np(Y,nom), vp(Z). np(np(Y),Case) --> pronoun(Y,Case). pronoun(i,nom) --> [i]. pronoun(we,nom) --> [we]. pronoun(me,acc) --> [me]. np(np(D,N),_) --> det(D,Num), common_noun(N,Num). vp(vp(Y,Z)) --> transitive(Y), np(Z,acc). Extra Arguments: Recap without { ... Prolog code ... }
Extra Arguments: Recap s --> [a],b. b --> [a],b. b --> [b],c. b --> [b]. c --> [b],c. c --> [b]. s(X) --> [a],b(a(X)). b(X) --> [a],b(a(X)). b(a(X)) --> [b],c(X). b(a(0)) --> [b]. c(a(X)) --> [b],c(X). c(a(0)) --> [b]. Lab = { anbn | n ≥ 1 } regular grammar + extra argument L = a+b+ a regular grammar Query:?- s(0,L,[]).
s(X) --> [a],b(a(X)). b(X) --> [a],b(a(X)). b(a(X)) --> [b],c(X). b(a(0)) --> [b]. c(a(X)) --> [b],c(X). c(a(0)) --> [b]. derivation tree: Extra Arguments: Recap s(0) rule 1 b(a(0)) rule 2 a b(a(a(0))) rule 3 a logic: ?- s(0,L,[]). start with 0 wrap an a(_) for each “a” unwrap a(_) for each “b” match #a’s with #b’s = counting c(a(0)) rule 6 b b
FSA So far • Equivalent formalisms regexp regular grammars
FSA • example (sheeptalk) • baa! • baaa! … • regexp • baa+! basic idea: “just follow the arrows” a s w x y a b a > state transition: in state s, if current input symbol is b go to state w ! state z accepting computation: in a final state, if there is no more input, accept string points to start state end state marked in red
FSA: Construction • step-by-step • regexp • baa+! = • baaa*! s >
FSA: Construction • step-by-step • regexp • baaa*! • b • from state s, • see a ‘b’, • move to state w s w b >
FSA: Construction • step-by-step • regexp • baaa*! • ba • from state w, • see an ‘a’, • move to state x s w x b a >
FSA: Construction • step-by-step • regexp • baaa*! • baa • from state x, • see an ‘a’, • move to state y s w x b a > a y
FSA: Construction • step-by-step • regexp • baaa*! • baaa* • baa • baaa • baaaa... • from y, • see an ‘a’, • move to ? s w x b a > a y but machine must have a finite number of states! a y’ a ...
Regular Expressions: Example • step-by-step • regexp • baaa*! • baa* • baa • baaa • baaaa... • from state y, • see an ‘a’, • “loop”, i.e. move back to, state y s w x b a > a y a
s w x b a > a z ! y a Regular Expressions: Example • step-by-step • regexp • baaa*! • baaa*! • from state y, • see an ‘!’, • move to final state z (indicated in red) Note: machine cannot finish (i.e. reach the end of the input string) in states s, w, x or y
Finite State Automata (FSA) • construction • the step-by-step FSA construction method we just used • works for any regular expression • conclusion • anything we can encode with a regular expression, we can build a FSA for it • an important step in showing that FSA and regexps are formally equivalent
a b c d x y e ... ... z etc. a etc. one loop for each character y regexp operators and FSA • basic wildcards • . and * • . any single character • e.g. p.t • put, pit, pat, pet • * zero or more characters over the whole alphabet
a x y a a e i o x y u regexp operators and FSA • basic wildcards • + • one or more of the preceding character • e.g. a+ • [ ] • range of characters • e.g. [aeiou]
x y > a regexp operators and FSA λ denotes the empty transition • basic wildcards • ? • zero or one of the preceding character • e.g. a? • Non-determinism • any FSA with an empty transition is non-deterministic • see example: could be in state x or y simultaneous • any FSA with an empty transition can be rewritten without the empty transition λ x y a
ya a ye a a a e e e e yi i i i o o o y x y z x o yo z u u u u yu regexp operators and FSA • backreferences: • by state splitting • e.g. ([aeiou])\1
s w x b a > a z ! y a Finite State Automata (FSA) • more formally • (Q,s,f,Σ,) • set of states (Q): {s,w,x,y,z} 4 states must be a finite set • start state (s): s • end state(s) (f): z • alphabet (Σ): {a, b, !} • transition function : signature: character × state → state • (b,s)=w • (a,w)=x • (a,x)=y • (a,y)=y • (!,y)=z
FSA • Finite State Automata (FSA) have a limited amount of expressive power • Let’s look at a modification to FSA and its effect on its power
b a b b abb String Transitions • so far... • all machines have had just a single character label on the arc • so if we allow strings to label arcs • do they endow the FSA with any more power? • Answer: No • because we can always convert a machine with string-transitions into one without
a s w x b a > s y baa > a z ! y ! a z Finite State Automata (FSA) • equivalent 3 state machine using string transitions 5 state machine