1 / 15

Project Presentation

Lin572 Advanced Statistic Methods in NLP. Project Presentation. Team Members: Anna Tinnemore Gabriel Neer Yow-Ren Chiang. PART 3. MaxEnt (yipee!). The Good Stuff:. Simple feature templates and extraction Elegant data structures for storage and easy access Pretty good results!.

Download Presentation

Project Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lin572 Advanced Statistic Methods in NLP Project Presentation Team Members: Anna Tinnemore Gabriel Neer Yow-Ren Chiang

  2. PART 3 MaxEnt (yipee!)

  3. The Good Stuff: • Simple feature templates and extraction • Elegant data structures for storage and easy access • Pretty good results!

  4. The Bad Stuff: • Hmmm. . . .

  5. Features • A few short loops collected the most relevant context features • No long-winded feature templates • Easy-access hashes

  6. Decent Results • Mid-nineties increasing with the size of the training data • Result

  7. PART 4 Task 2 Bagging

  8. Tie Function • use Tie::File; • use Fcntl; • for my $bag_num (1 .. $B) { # The Nth bag from file foo.txt becomes foo.txtbagN, etc. my $bag_name = "$file_name-bag$bag_num"; open (BAG, ">$bag_name") or die "Can't open $bag_name for writing: $!"; for (@lines) { # Pick random line of file. my $line = $lines[ rand @lines ]; print BAG "$line\n"; # Output to the bag. } }

  9. Combination • VOTING!!

  10. Step 1: • # Loop through file and remember words. Keep them grouped by sentence. while (<FILE>) { foreach (@word_tags) { my @wordtag = split /\//; push (@words, ($wordtag[0])); } push (@sentences, (\@words)); }

  11. Step 2: • # Go through file and for each word, increase the count of its tag for (@ARGV) { my $tag_index = 0; while (<FILE>) { foreach (@word_tags) { my @wordtag = split /\//; my $tag = $wordtag[1]; $tags[$tag_index]->{$tag}++; $tag_index++; } } }

  12. Step 3: • # Go through the sentences and print out each word/tag pair. my $tag_index = 0; foreach my $sent (@sentences) { foreach my $word (@$sent) { my $tag = max_tag($tags[$tag_index]); $tag_index++; print "$word/$tag "; } print "\n"; }

  13. Finding the “Best Tag” • # Find the tag with the highest count. sub max_tag { my $tag_hash = shift; (my $tag) = keys %$tag_hash; my $tag_count = $tag_hash->{$tag}; foreach (keys %$tag_hash) { if ($tag_hash->{$_} > $tag_count) { $tag = $_; $tag_count = $tag_hash->{$tag} } } return $tag; }

  14. Procedure • Creating Bootstrap samples • Treating the file as an array for lines. • N random array indices are selected and each corresponding line is output to a file • Combine_tool.pl • opens the file corresponding to its first argument • reads in all words, aggregated by sentence • An array of tag hashes is created. • For each file in its arg list, opens that file and reads the tags sequentially • The hash item corresponding to the tag in the appropriate index of the tag area is incremented • For each index, the hash label with the highest count is chosen as the correct tag • Re-associate the tags with their words • Print out the word/tag pairs

  15. Result

More Related