100 likes | 243 Views
Markov Chain Algorithm in Perl. Michael Conway CS 265 May 4, 2011. Markov Chain Algorithm. Goal: Mimic proper English composition. 1. Populate prefix hash table with suffix lists. 2. Start at the beginning and jump from prefix to prefix, printing suffixes. Perl Implementation.
E N D
Markov Chain Algorithmin Perl Michael Conway CS 265 May 4, 2011
Markov Chain Algorithm Goal: Mimic proper English composition. 1. Populate prefix hash table with suffix lists. 2. Start at the beginning and jump from prefix to prefix, printing suffixes.
Perl Implementation # markov.pl: markov chain algorithm for 2-word prefixes $MAXGEN = 10000; $NONWORD = "\n"; $w1 = $w2 = $NONWORD; # initial state while (<>) { # read each line of input foreach (split) { push(@{$statetab{$w1}{$w2}}, $_); ($w1, $w2) = ($w2, $_); # multiple assignment } } push(@{$statetab{$w1}{$w2}}, $NONWORD); # add tail $w1 = $w2 = $NONWORD; for ($i = 0; $i < $MAXGEN; $i++) { $suf = $statetab{$w1}{$w2}; # array reference $r = int(rand @$suf); # @$suf is number of elems exit if (($t = $suf->[$r]) eq $NONWORD); print "$t\n"; ($w1, $w2) = ($w2, $t); # advance chain }
Hash Generation $w1 = $w2 = $NONWORD; # initial state while (<>) { # read each line of input foreach (split) { push(@{$statetab{$w1}{$w2}}, $_); ($w1, $w2) = ($w2, $_); # multiple assignment } } push(@{$statetab{$w1}{$w2}}, $NONWORD); # add tail • Iterate over words in stdin, store suffixes • IMPORTANT code segment: @{$statetab{$w1}{$w2}} -> $statetab is implicitly declared hash -> $statetab{$w1} is i.d. reference to hash -> @{ } gets array “referenced” by $statetab{$w1}{$w2} • Note: <>, foreach, push(), multiple assignment
Output Generation $w1 = $w2 = $NONWORD; for ($i = 0; $i < $MAXGEN; $i++) { $suf = $statetab{$w1}{$w2}; # array reference $r = int(rand @$suf); # @$suf is number of elems exit if (($t = $suf->[$r]) eq $NONWORD); print "$t\n"; ($w1, $w2) = ($w2, $t); # advance chain } • Same $statetab{$w1}{$w2}construction used for array reference • Note: rand, exit line, ->, interpolated string in print, multiple assignment
Pros and Cons • Pros: • Very short source code • Necessary structures (array, hash) are built-in • Decent performance • Cons: • Can be confusing, especially to new users • Outperformed by some (like C) • Difficult to extend to different prefix sizes
Extension: Different Prefix Sizes # markov_n.pl: markov chain algorithm for n-word prefixes $PREFLEN = 5; # or whatever $MAXGEN = 80; $NONWORD = "\n"; foreach $i (0..$PREFLEN-1) { $words[$i] = $NONWORD; # initial state } while (<>) { # read each line of input foreach (split) { push(@{hash_lookup(\@words)}, $_); @words = (@words[1..$#words],$_); } } push(@{hash_lookup(\@words)}, $NONWORD); # add tail
Extension: Different Prefix Sizes @words = (); foreach $i (0..$PREFLEN-1) { $words[$i] = $NONWORD; } for ($i = 0; $i < $MAXGEN; $i++) { $suf = hash_lookup(\@words); # array reference $r = int(rand @$suf); # @$suf is number of elems exit if (($t = $suf->[$r]) eq $NONWORD); print "$t\n"; @words = (@words[1..$#words],($t)); # advance chain } sub hash_lookup { my $ref = \%statetab; my @wds = @{@_[0]}; for ($i = 0;$i < $#wds;$i++) { $ref = \%{${$ref}{$wds[$i]}}; } $ref = \@{${$ref}{$wds[$#wds]}}; return $ref; }