1 / 47

LING 581: Advanced Computational Linguistics

LING 581: Advanced Computational Linguistics. Lecture Notes February 13th. Bikel Collins and EVALB. How did you guys do with Bikel - Collins on section 23?. Steps. Section 23 of the WSJ Penn Treebank is used for evaluation. We need to get it into the one-tree-per-line format needed.

inari
Download Presentation

LING 581: Advanced Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 581: Advanced Computational Linguistics Lecture Notes February 13th

  2. Bikel Collins and EVALB • How did you guys do with Bikel-Collins on section 23?

  3. Steps • Section 23 of the WSJ Penn Treebank is used for evaluation We need to get it into the one-tree-per-line format needed

  4. Steps • Assemblesection 23 intoa single file • cd Desktop • cat ~/research/TREEBANK_3/parsed/mrg/wsj/23/*.mrg > wsj_23.mrg • more wsj_23.mrg (on my Desktop folder)

  5. Steps • Use tsurgeon (that comes with tregex) to transform wsj-23.mrg into a file with one tree per line (wsj-23.gold) • File: README-tsurgeon.txt Program Command Line Options: -treeFile <filename> specify the name of the file that has the trees you want to transform. -po <matchPattern> <operation> Apply a single operation to every tree using the specified match pattern and the specified operation. -s Prints the output trees one per line, instead of pretty-printed.

  6. Steps • Use tsurgeon (that comes with tregex) to transform wsj-23.mrg into a file with one tree per line (wsj-23.gold) • Note: (assume my current directory is my Desktop and I’ve installed tregex elsewhere) ../Downloads/stanford-tregex-2014-01-04/tsurgeon.sh -treeFilewsj_23.mrg -s > wsj_23.gold Error: Could not find or load main class edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon • File: tsurgeon.sh (Bourne-again shell script) #!/bin/sh export CLASSPATH=/Users/sandiway/Downloads/stanford-tregex-2014-01-04/stanford-tregex.jar:$CLASSPATH java -mx100m edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon "$@” ../Downloads/stanford-tregex-2014-01-04/tsurgeon.sh -treeFile wsj_23.mrg -s > wsj_23.gold

  7. Steps • File: wsj_23.gold (on my Desktop)

  8. Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • One way: load in sec 23, search /.*/, save matched sentences …

  9. Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • One way: load in sec 23, search /.*/, save matched sentences …

  10. Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • One way: load in sec 23, search /.*/, save matched sentences … • Run a Perl script to take out the file name, 0 and anything starting with a *, e.g.

  11. Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • Another way: use the official Treebank pos files

  12. Steps • Before you can run the parser, you need to train it • First, assemble the training data cd research/TREEBANK_3/parsed/mrg/wsj ls 00 06 12 18 24 01 07 13 19 MERGE.LOG 02 08 14 20 03 09 15 21 04 10 16 22 05 11 17 23 cat 0[2-9]/*.mrg 1[0-9]/*.mrg 2[01]/*.mrg > ~/Desktop/wsj-02-21.mrg

  13. Steps • Before you can run the parser, you need to train it • ../research/dbp/bin/train 800 ../research/dbp/settings/collins.properties wsj-02-21.mrg • Executing command • \tjava -server –Xms800m –Xmx800m -cp /Users/sandiway/research/dbp/dbparser.jar -Dparser.settingsFile=../research/dbp/settings/collins.propertiesdanbikel.parser.Trainer -i wsj-02-21.mrg -o wsj-02-21.observed.gz -od wsj-02-21.obj.gz

  14. Steps • Running the parser • (assume my data is on my Desktop – the current working directory) • Need to locate: • settings file: collins.properties • derived data file: wsj-02-21.obj.gz • Sentence file: test.txt • Run: • ../research/dbp/bin/parse 300 ../research/dbp/settings/collins.properties ../research/dbparser/bin/wsj-02-21.obj.gz test.txt • Output trees (in current working directory): • test.txt.parsed

  15. Steps • Running the parser • (assume my data is on my Desktop – the current working directory) • If you have multiple processors, you can first split the input file into parts manually or using the split command: • split -l 500 wsj-23.txt wsj-23 Then merge the parses produced into one file…

  16. Steps • Running evalb • Need: • parameter file: COLLINS.prm • Parser output: test.txt.parsed • Gold file: test.gold

  17. Steps • (Assume my data files are on my Desktop and EVALB is installed in my research subdirectory) • ../research/EVALB/evalb -p ../research/EVALB/COLLINS.prmtest.goldtest.txt.parsed Recall 83.8, Precision 89.2

  18. Summary • WSJ corpus: sections 00 through 24 • Training: normally 02-21 (20 sections) • concatenate mrg files as input to Bikel Collins trainer • training is fast • Evaluation: on section 23 • use tsurgeon to get data ready for EVALB • Use either the .pos files or tregex to grab the sentences for parsing • 2400+ sentences in section 23, parsing is slow… • Can split the sentences up and run on multiple processors

  19. Summary • Question: • How much training data do you need? • Homework: • How does Bikel Collins vary in precision and recallon test data if you randomly pick 1..24 out of 25 sections to do the training with? • Test section: • I want you to pick a test section that’s not section 23 • Use EVALB • plot precision and recall graphs

  20. Perl Code • Generate training data function shuffle permutes @sections

  21. Training Data

  22. Training Data

  23. Training Data • What does the graph for the F-score (LR/LP) look like? F-score report your results next time… # of sentences used for training

  24. Robustness and Sensitivity it’s often assumed that statistical models are less brittle than symbolic models • can get parses for ungrammatical data • are they sensitive to noise or small perturbations?

  25. Robustness and Sensitivity Examples • Herman mixed the water with the milk • Herman mixed the milk with the water • Herman drank the water with the milk • Herman drank the milk with the water (mix) (drink) f(water)=117, f(milk)=21

  26. Robustness and Sensitivity Examples • Herman mixed the water with the milk • Herman mixed the milk with the water • Herman drank the water with the milk • Herman drank the milk with the water (high) logprob = -50.4 (low) logprob = -47.2 different PP attachment choices

  27. Robustness and Sensitivity First thoughts... • does milk forces low attachment? (high attachment for other nouns like water, toys, etc.) Is there something special about the lexical item milk? • 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun

  28. Robustness and Sensitivity First thoughts... Is there something special about the lexical item milk? • 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun • but just one sentence (#5212) with PP attachment for milk Could just one sentence out of 39,832 training examples affect the attachment options?

  29. sentences derived counts wsj-02-21.obj.gz parser parses Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain

  30. Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain ✕ delete the PP with 4% butterfat altogether

  31. Treebank sentences wsj-02-21.mrg training Derived counts wsj-02-21.obj.gz Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain the Bikel/Collins parser can be retrained quicky or bump it up to the VP level

  32. Robustness and Sensitivity • Result: • high attachment for previous PP adjunct to milk Could just one sentence out of 39,832 training examples affect the attachment options? YES • Why such extreme sensitivity to perturbation? • logprobs are conditioned on many things; hence, lots of probabilities to estimate • smoothing • need every piece of data, even low frequency ones

  33. Details… • Two sets of files:

  34. Details…

  35. Robustness and Sensitivity • (Bikel 2004): • “it may come as a surprise that the [parser] needs to access more than 219 million probabilities during the course of parsing the 1,917 sentences of Section 00 [of the PTB].''

  36. Robustness and Sensitivity • Trainer has a memory like a phone book:

  37. Robustness and Sensitivity • (mod ((with IN) (milk NN) PP (+START+) ((+START+ +START+)) NP-A NPB () false right) 1.0) • modHeadWord(with IN) • headWord(milk NN) • modifier PP • previousMods(+START+) • previousWords((+START+ +START+)) • parent NP-A • head NPB • subcat() • verbIntervening false • side right • (mod ((+STOP+ +STOP+) (milk NN) +STOP+ (PP) ((with IN)) NP-A NPB () false right) 1.0) • modHeadWord (+STOP+ +STOP+) • headWord (milk NN) • modifier +STOP+ • previousMods (PP) • previousWords ((with IN)) • parent NP-A • head NPB • subcat () • verbIntervening false • side right • Frequency 1 observed data for: • (NP (NP (DT a)(NNmilk))(PP(IN with)(NP (ADJP (CD 4)(NN %))(NN butterfat))))

  38. Robustness and Sensitivity 76.8% singular events 94.2% 5 or fewer occurrences

  39. Robustness and Sensitivity • Full story more complicated than described here... • by picking different combinations of verbs and nouns, you can get a range of behaviors f(drank)=0 might as well have picked flubbed

  40. Another experiment it’s often assumed that statistical models are less brittle than symbolic models…

  41. Parsing Data • All 5! (=120) permutations of • colorless green ideas sleep furiously .

  42. Parsing Data • The winning sentence was: • Furiously ideas sleep colorless green . • after training on sections 02-21 (approx. 40,000 sentences) sleep takes ADJP object with two heads adverb (RB) furiously modifies noun

  43. Parsing Data • The next two highest scoring permutations were: • Furiously green ideas sleep colorless . • Green ideas sleep furiously colorless . sleep takes NP object sleep takes ADJP object

  44. Parsing Data • (Pereira 2002) compared Chomsky’s original minimal pair: • colorless green ideas sleep furiously • furiously sleep ideas green colorless Ranked #23 and #36 respectively out of 120

  45. Parsing Data But … • graph (next slide) shows how arbitrary these rankings are: • furiously sleep ideas green colorless Vastly outranks • colorless green ideas sleep furiously (and the top 3 sentences) when trained on randomly chosen sections covering 14,188 to 31,616 WSJ PTB sentences

  46. Rank of Best Parse for Sentence Permutation Sentence

  47. Rank of Best Parse for Sentence Permutation • Note also that: • colorless green ideas sleep furiously totally outranks the top 3 (and #36) just briefly on one data point (when trained on 33605 sentences)

More Related