130 likes | 243 Views
CMSC 723 / LING 645: Intro to Computational Linguistics. September 22, 2004: Dorr Supplement: PC-Kimmo Tutorial Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee. S. +:0, =. S. +:e. s. Building Automata in Kimmo: English Epenthesis.
E N D
CMSC 723 / LING 645: Intro to Computational Linguistics September 22, 2004: Dorr Supplement: PC-Kimmo Tutorial Prof. Bonnie J. DorrDr. Christof MonzTA: Adam Lee
S +:0, = S +:e s Building Automata in Kimmo: English Epenthesis Chomsky & Halle: +s → es / X__, else s; where X = {s, ch, s, sh, x} How do we implement this? X+s ==> Xes; else Y+s ==> Ys What does this look like? Let S = {x, s, z} Note 1: S = S:S, s = s:s, etc. Note 2 : No way to get out of last state except for `s’
Building Automata in Kimmo: English Epenthesis Chomsky & Halle: +s → es / X__, else s; where X = {s, ch, s, sh, x} How do we implement this? X+s ==> Xes; else Y+s ==> Ys What does this look like? Let S = {x, s, z} S +:0, = S +:e +:e +:0 Note 1: S = S:S, s = s:s, etc. Note 2 : No way to get out of last state except for `s’ Note 3: No way to get out of intermed state on `s’ =, S s
Building Automata in Kimmo: English Epenthesis This takes care of x, s, z, but what about ch, sh? +:0, = S S +:e +:e +:0 =, S s
Building Automata in Kimmo: English Epenthesis This takes care of x, s, z, but what about ch, sh, ss? Add two new states. (And now we need to add numbers!) +:e s s,h,S +:0, = s S S +:e c s,h,S +:e +:0 c =, S s
Building Automata in Kimmo: English Epenthesis Problem: Now that we have introduced c, s, h, we need to worry about these in other states! Add numbers to states! +:e 4 s s,h,S +:0, = s S 5 S +:e 1 3 c s,h,S +:e 2 +:0 6 c =, S s
Building Automata in Kimmo: English Epenthesis Problem: Now that we have introduced c, s, h, we need to worry about these in other states! Add numbers to states! +:e 4 s s,h,S +:0, =, h s s S c 5 S +:e 1 3 c c s,h,S +:e 2 h +:0 6 c =, S, c, h s
Building Automata in Kimmo: English Epenthesis Problem: In fact, we have to worry about all feasible pairs in every state. Add numbers to states! +:e 4 s s,h,S +:0, =, h s = s = S c 5 S +:e 1 3 c c Note: If a feasible pair is missing between 2 states, it is assumed that pair is not possible, e.g., we cannot go from 6 to 1 on s:s = s,h,S +:e 2 h +:0 6 c =, S, c, h s
Building Automata in Kimmo: English Epenthesis Unfortunately, life can get even more complicated (because of restriction below): there can be interaction with other automata. For example, Epenthesis interacts with Y-replacement, so we need a feasible pair for y:i and +:0 in Epenthesis! +:e 4 s s,h,S +:0, =, h s = s = S c 5 S +:e 1 3 c c Note: If a feasible pair is missing between 2 states, it is assumed that pair is not possible, e.g., we cannot go from 6 to 1 on s:s = s,h,S +:e 2 h +:0 6 c =, S, c, h s
4 +:e s s,h,S +:0, =, h s = s S = 5 1 3 c S +:e c c 2 = s,h,S +:e 6 +:0 h c =, S, c, h s Building Automata in Kimmo: English Epenthesis Here is the actual automaton matrix used in simple-english.aut RULE "Epenthesis" 6 9 c h s S y + + = _ c h s S i e 0 = _ 1: 2 1 4 3 3 0 1 1 1 2: 2 3 3 3 3 0 1 1 1 3: 2 1 3 3 3 5 6 1 1 4: 2 3 3 3 3 5 6 1 1 5. 0 0 1 0 0 0 0 0 0 6: 1 1 0 1 1 5 6 1 1 Hint: This is the sort of thing you’ll need for German find+t
What about long-distance dependencies? • How do we handle Buch and Bücher? • Trick: Use maybe-umlaut, coupled with special marker in the suffix • Root form in lexicon: b|ch Continuation class (or alternation) is called /NOUN-NEUT-ADD-ER • In lexicon for NOUN-NEUT-ADD-ER, put ending +&er, where & indicates that the root may have a character that should be umlauted. • In the Maybe-Umlaut automaton, search for sequence “| .. +&e” and change to “* .. +&e”
Building a Lexicon in Kimmo: Simple English Example • ALTERNATION /Root RootALTERNATION /N N • ALTERNATION /End End • LEXICON INITIAL 0 /Root "(“ • LEXICON N 0 /C1 "(cat n) (person p3) (number sg)“ +s /C2 "(cat n) (person p3) (number pl)“ +y /A "(cat a)“ • LEXICON Root spy /N spy /V • LEXICON End 0 # “)” • END
Demo of PC-Kimmo • cd kimmo • pckimmo • load rules simple-english.aut • load lexicon simple-english.dic • generate try+s • recognize tries • set tracing on • recognize tries