280 likes | 436 Views
Introduction to FSM Toolkit. Examples: Part I NLP Course 07. Example 1. Acceptor for “sheeptalk”: /baa+!/ Text Representation Symbols File (sheep.txt) (S.syms) 0 1 b eps 0 1 2 a a 1 2 3 a b 2
E N D
Introduction toFSM Toolkit Examples: Part I NLP Course 07
Example 1 • Acceptor for “sheeptalk”: /baa+!/ Text RepresentationSymbols File (sheep.txt) (S.syms) 0 1 b eps 0 1 2 a a 1 2 3 a b 2 3 3 a ! 3 3 4 ! w 4 4 o 5 u 6 f 7 -Symbols w, o, u and f are needed for the 2nd example. -eps symbol stands for possible future epsilon transitions.
Example 1 • fsmcompile –i S.syms sheep.txt > sheep.fsa • fsmdraw –i S.syms sheep.fsa | dot –Tps > sheep.ps • Image format: PostScript. For jpg write: fsmdraw –i S.syms sheep.fsa | dot –Tjpg > sheep.jpg
Example 2 • Acceptor for “dogtalk”: /wouf!/ Text RepresentationSymbols File (dog.txt) (S.syms) same as Ex.1 0 1 w eps 0 (sheep & dog share 1 2 o a 1 the same symbols file) 2 3 u b 2 3 4 f ! 3 4 5 ! w 4 5 o 5 u 6 f 7
Example 2 • fsmcompile –i S.syms dog.txt > dog.fsa • fsmdraw –i S.syms dog.fsa | dot –Tps > dog.ps
Having the 2 fsa for “sheeptalk” and “dogtalk”, use the appropriate function to generate an acceptor that accepts a “sheeptalk” OR a “dogtalk”.
Example 3 • fsmunion sheep.fsa dog.fsa > shORdg.fsa • fsmdraw –iS.syms < shORdg.fsa | dot –Tps > shORdg.ps
Having the 2 fsa for “sheeptalk” and “dogtalk”, use the appropriate function to generate an acceptor that accepts a “sheeptalk” AND a “dogtalk”, using the constraint that sheep talks first!
Example 4 • fsmconcat sheep.fsa dog.fsa > shANDdg.fsa • fsmdraw –iS.syms < shANDdg.fsa | dot –Tps > shANDdg.ps
But the Society of Animals is always fair! This time let the dog to speak first…!!! ?
Example 5 • Generate the following weighted FSM:
Example 5 Text RepresentationSymbols File (A.txt) (S2.syms) 0 1 red 0.3 eps 0 1 3 blue 0.7 red 1 0 2 green 0.4 blue 2 2 3 yellow 0.8 green 3 3 0.3 yellow 4 4 0.4 As before: fsmcompile, fsmdraw
Example 5 • fsmbestpath A.fsa > B.fsa • fsmdraw –iS2.syms < B.fsa | dot –Tps > B.ps
Perl & FSM Toolkit • Problem Definition: We have as input a file containing a single sentence of lower case words. “ hi nlp world” Goal: transform the above words into upper case using FSM. “ HI NLP WORLD”
Perl & FSM Toolkit • A Perl script (composition.pl) that: • Extracts the lower case words from the input file • Generates the corresponding transducer • Generates a second transducer that transforms each word to its’ upper case form • Compose the two transducers • Projects the output of the resulted transducer • Extracts the output of the above transducer by reading the appropriate file and prints the upper case sentence to the screen
#!/usr/bin/perl open (IN, $ARGV[0]) || die “error"; $rdln = <IN>; @in_wrds = split(/\s+/,$rdln); close(IN); # write the files for the transducers open (OUT_T11, ">T11") || die "error"; open (OUT_T12, ">T12") || die “error"; @low_up_words=@in_wrds; $c=0; foreach $tmp (@in_wrds) { print OUT_T11 ($c,"\t",$c+1,"\t",$tmp,"\t",$tmp,"\n"); print OUT_T12 ($c,"\t",$c+1,"\t",$tmp,"\t",uc($tmp),"\n"); push (@low_up_words,uc($tmp)); #gather lower and upper case words $c++; } print OUT_T11 ($c,"\n"); print OUT_T12 ($c,"\n"); close(OUT_T1); close(OUT_T2);
# write symbols file $i=1; open (OUT_S12, ">S12") || die “error"; foreach $tmp (@low_up_words) { print OUT_S12 ($tmp,"\t",$i,"\n"); $i++; } close(OUT_S12); #call the FSM Library system ("fsmcompile -iS12 -oS12 -t < T11 > T11.fst"); system ("fsmdraw -iS12 -oS12 < T11.fst | dot -Tps > T11.ps"); system ("fsmcompile -iS12 -oS12 -t < T12 > T12.fst"); system ("fsmdraw -iS12 -oS12 < T12.fst | dot -Tps > T12.ps"); system ("fsmcompose T11.fst T12.fst > T12comp.fst"); system ("fsmdraw -iS12 -oS12 < T12comp.fst | dot -Tps > T12comp.ps");
system ("fsmproject -2 T12comp.fst > final_out.fsa "); system ("fsmdraw -iS12 < final_out.fsa | dot -Tps > final_out.ps"); system ("fsmprint -iS12 < final_out.fsa > final_out"); # Finally, read the resulted file and extract the field of interest open (IN2, "final_out") || die "can not open the input file...\n"; $rdln2 = <IN2>; while ($rdln2 ne "") { @out_wrds = split(/\s+/,$rdln2); push (@up_wrds,$out_wrds[2]); $rdln2 = <IN2>; } close(IN2); # print the upper case content of the initial input file print (join(" ",@up_wrds),"\n");
Perl & FSM Toolkit First fst (T11.fst) Second fst (T12.fst) 0 1 hi hi 0 1 hi HI 1 2 nlp nlp 1 2 nlp NLP 2 3 world world 2 3 world WORLD 3 3 Symbols File (S12) hi 1 nlp 2 world 3 HI 4 NLP 5 WORLD 6
Perl & FSM Toolkit • Compose T11.fst and T12.fst system ("fsmcompose T11.fst T12.fst > T12comp.fst"); system ("fsmdraw -iS12 -oS12 < T12comp.fst | dot -Tps > T12comp.ps");
Perl & FSM Toolkit Project the output of the resulted transducer: system ("fsmproject -2 T12comp.fst > final_out.fsa "); Draw the final_out.fsa: system ("fsmdraw -iS12 < final_out.fsa | dot -Tps > final_out.ps"); Print a textual description of the above fsa: system ("fsmprint -iS12 < final_out.fsa > final_out"); Read the textual this textual description using Perl: open (IN2, "final_out") || die "can not open the input file...\n"; $rdln2 = <IN2>; . . .
Perl & FSM Toolkit Textual description of final_out.fsa: 0 1 HI 1 2 NLP 2 3 WORLD 3
Extras 1 • Generate the following acceptor, determinize and minimize it
Extras 2 • Generate the following transducers and find their composition