470 likes | 586 Views
LING 581: Advanced Computational Linguistics. Lecture Notes February 13th. Bikel Collins and EVALB. How did you guys do with Bikel - Collins on section 23?. Steps. Section 23 of the WSJ Penn Treebank is used for evaluation. We need to get it into the one-tree-per-line format needed.
E N D
LING 581: Advanced Computational Linguistics Lecture Notes February 13th
Bikel Collins and EVALB • How did you guys do with Bikel-Collins on section 23?
Steps • Section 23 of the WSJ Penn Treebank is used for evaluation We need to get it into the one-tree-per-line format needed
Steps • Assemblesection 23 intoa single file • cd Desktop • cat ~/research/TREEBANK_3/parsed/mrg/wsj/23/*.mrg > wsj_23.mrg • more wsj_23.mrg (on my Desktop folder)
Steps • Use tsurgeon (that comes with tregex) to transform wsj-23.mrg into a file with one tree per line (wsj-23.gold) • File: README-tsurgeon.txt Program Command Line Options: -treeFile <filename> specify the name of the file that has the trees you want to transform. -po <matchPattern> <operation> Apply a single operation to every tree using the specified match pattern and the specified operation. -s Prints the output trees one per line, instead of pretty-printed.
Steps • Use tsurgeon (that comes with tregex) to transform wsj-23.mrg into a file with one tree per line (wsj-23.gold) • Note: (assume my current directory is my Desktop and I’ve installed tregex elsewhere) ../Downloads/stanford-tregex-2014-01-04/tsurgeon.sh -treeFilewsj_23.mrg -s > wsj_23.gold Error: Could not find or load main class edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon • File: tsurgeon.sh (Bourne-again shell script) #!/bin/sh export CLASSPATH=/Users/sandiway/Downloads/stanford-tregex-2014-01-04/stanford-tregex.jar:$CLASSPATH java -mx100m edu.stanford.nlp.trees.tregex.tsurgeon.Tsurgeon "$@” ../Downloads/stanford-tregex-2014-01-04/tsurgeon.sh -treeFile wsj_23.mrg -s > wsj_23.gold
Steps • File: wsj_23.gold (on my Desktop)
Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • One way: load in sec 23, search /.*/, save matched sentences …
Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • One way: load in sec 23, search /.*/, save matched sentences …
Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • One way: load in sec 23, search /.*/, save matched sentences … • Run a Perl script to take out the file name, 0 and anything starting with a *, e.g.
Steps • Extracting the sentences for parsing • (Hmm, I forgot how I did it before..) • Another way: use the official Treebank pos files
Steps • Before you can run the parser, you need to train it • First, assemble the training data cd research/TREEBANK_3/parsed/mrg/wsj ls 00 06 12 18 24 01 07 13 19 MERGE.LOG 02 08 14 20 03 09 15 21 04 10 16 22 05 11 17 23 cat 0[2-9]/*.mrg 1[0-9]/*.mrg 2[01]/*.mrg > ~/Desktop/wsj-02-21.mrg
Steps • Before you can run the parser, you need to train it • ../research/dbp/bin/train 800 ../research/dbp/settings/collins.properties wsj-02-21.mrg • Executing command • \tjava -server –Xms800m –Xmx800m -cp /Users/sandiway/research/dbp/dbparser.jar -Dparser.settingsFile=../research/dbp/settings/collins.propertiesdanbikel.parser.Trainer -i wsj-02-21.mrg -o wsj-02-21.observed.gz -od wsj-02-21.obj.gz
Steps • Running the parser • (assume my data is on my Desktop – the current working directory) • Need to locate: • settings file: collins.properties • derived data file: wsj-02-21.obj.gz • Sentence file: test.txt • Run: • ../research/dbp/bin/parse 300 ../research/dbp/settings/collins.properties ../research/dbparser/bin/wsj-02-21.obj.gz test.txt • Output trees (in current working directory): • test.txt.parsed
Steps • Running the parser • (assume my data is on my Desktop – the current working directory) • If you have multiple processors, you can first split the input file into parts manually or using the split command: • split -l 500 wsj-23.txt wsj-23 Then merge the parses produced into one file…
Steps • Running evalb • Need: • parameter file: COLLINS.prm • Parser output: test.txt.parsed • Gold file: test.gold
Steps • (Assume my data files are on my Desktop and EVALB is installed in my research subdirectory) • ../research/EVALB/evalb -p ../research/EVALB/COLLINS.prmtest.goldtest.txt.parsed Recall 83.8, Precision 89.2
Summary • WSJ corpus: sections 00 through 24 • Training: normally 02-21 (20 sections) • concatenate mrg files as input to Bikel Collins trainer • training is fast • Evaluation: on section 23 • use tsurgeon to get data ready for EVALB • Use either the .pos files or tregex to grab the sentences for parsing • 2400+ sentences in section 23, parsing is slow… • Can split the sentences up and run on multiple processors
Summary • Question: • How much training data do you need? • Homework: • How does Bikel Collins vary in precision and recallon test data if you randomly pick 1..24 out of 25 sections to do the training with? • Test section: • I want you to pick a test section that’s not section 23 • Use EVALB • plot precision and recall graphs
Perl Code • Generate training data function shuffle permutes @sections
Training Data • What does the graph for the F-score (LR/LP) look like? F-score report your results next time… # of sentences used for training
Robustness and Sensitivity it’s often assumed that statistical models are less brittle than symbolic models • can get parses for ungrammatical data • are they sensitive to noise or small perturbations?
Robustness and Sensitivity Examples • Herman mixed the water with the milk • Herman mixed the milk with the water • Herman drank the water with the milk • Herman drank the milk with the water (mix) (drink) f(water)=117, f(milk)=21
Robustness and Sensitivity Examples • Herman mixed the water with the milk • Herman mixed the milk with the water • Herman drank the water with the milk • Herman drank the milk with the water (high) logprob = -50.4 (low) logprob = -47.2 different PP attachment choices
Robustness and Sensitivity First thoughts... • does milk forces low attachment? (high attachment for other nouns like water, toys, etc.) Is there something special about the lexical item milk? • 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun
Robustness and Sensitivity First thoughts... Is there something special about the lexical item milk? • 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun • but just one sentence (#5212) with PP attachment for milk Could just one sentence out of 39,832 training examples affect the attachment options?
sentences derived counts wsj-02-21.obj.gz parser parses Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain
Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain ✕ delete the PP with 4% butterfat altogether
Treebank sentences wsj-02-21.mrg training Derived counts wsj-02-21.obj.gz Robustness and Sensitivity • Simple perturbation experiment • alter that one sentence and retrain the Bikel/Collins parser can be retrained quicky or bump it up to the VP level
Robustness and Sensitivity • Result: • high attachment for previous PP adjunct to milk Could just one sentence out of 39,832 training examples affect the attachment options? YES • Why such extreme sensitivity to perturbation? • logprobs are conditioned on many things; hence, lots of probabilities to estimate • smoothing • need every piece of data, even low frequency ones
Details… • Two sets of files:
Robustness and Sensitivity • (Bikel 2004): • “it may come as a surprise that the [parser] needs to access more than 219 million probabilities during the course of parsing the 1,917 sentences of Section 00 [of the PTB].''
Robustness and Sensitivity • Trainer has a memory like a phone book:
Robustness and Sensitivity • (mod ((with IN) (milk NN) PP (+START+) ((+START+ +START+)) NP-A NPB () false right) 1.0) • modHeadWord(with IN) • headWord(milk NN) • modifier PP • previousMods(+START+) • previousWords((+START+ +START+)) • parent NP-A • head NPB • subcat() • verbIntervening false • side right • (mod ((+STOP+ +STOP+) (milk NN) +STOP+ (PP) ((with IN)) NP-A NPB () false right) 1.0) • modHeadWord (+STOP+ +STOP+) • headWord (milk NN) • modifier +STOP+ • previousMods (PP) • previousWords ((with IN)) • parent NP-A • head NPB • subcat () • verbIntervening false • side right • Frequency 1 observed data for: • (NP (NP (DT a)(NNmilk))(PP(IN with)(NP (ADJP (CD 4)(NN %))(NN butterfat))))
Robustness and Sensitivity 76.8% singular events 94.2% 5 or fewer occurrences
Robustness and Sensitivity • Full story more complicated than described here... • by picking different combinations of verbs and nouns, you can get a range of behaviors f(drank)=0 might as well have picked flubbed
Another experiment it’s often assumed that statistical models are less brittle than symbolic models…
Parsing Data • All 5! (=120) permutations of • colorless green ideas sleep furiously .
Parsing Data • The winning sentence was: • Furiously ideas sleep colorless green . • after training on sections 02-21 (approx. 40,000 sentences) sleep takes ADJP object with two heads adverb (RB) furiously modifies noun
Parsing Data • The next two highest scoring permutations were: • Furiously green ideas sleep colorless . • Green ideas sleep furiously colorless . sleep takes NP object sleep takes ADJP object
Parsing Data • (Pereira 2002) compared Chomsky’s original minimal pair: • colorless green ideas sleep furiously • furiously sleep ideas green colorless Ranked #23 and #36 respectively out of 120
Parsing Data But … • graph (next slide) shows how arbitrary these rankings are: • furiously sleep ideas green colorless Vastly outranks • colorless green ideas sleep furiously (and the top 3 sentences) when trained on randomly chosen sections covering 14,188 to 31,616 WSJ PTB sentences
Rank of Best Parse for Sentence Permutation • Note also that: • colorless green ideas sleep furiously totally outranks the top 3 (and #36) just briefly on one data point (when trained on 33605 sentences)