1 / 18

Bagging

Bagging. LING 572 Advanced Statistical Methods in NLP Presentation March 9, 2006 David Bullock Lap Cheung. Bagging – System Overview. From each training sample (1K, 5K, 10K) we created 10 bootstrap samples Trained 3 taggers (Trigram, TBL, MaxEnt) on each bootstrap sample.

jacoba
Download Presentation

Bagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bagging LING 572 Advanced Statistical Methods in NLP Presentation March 9, 2006 David Bullock Lap Cheung

  2. Bagging – System Overview • From each training sample (1K, 5K, 10K) we created 10 bootstrap samples • Trained 3 taggers (Trigram, TBL, MaxEnt) on each bootstrap sample LING 572 Bullock & Cheung

  3. Bagging - Example • Input: • Rolls-Royce/NNP Motor/NNP Cars/NNPS... • The/DT luxury/NN auto/NN maker/NN... • Investors/NNS are/VBP appealing/VBG to/TO... • According/VBG to/TO some/DT estimates/NNS • Output: • Rolls-Royce/NNP Motor/NNP Cars/NNPS... • Rolls-Royce/NNP Motor/NNP Cars/NNPS... • The/DT luxury/NN auto/NN maker/NN... • According/VBG to/TO some/DT estimates/NNS LING 572 Bullock & Cheung

  4. Bagging – System Overview (2) • Tagged the test data. • Combined the results by voting LING 572 Bullock & Cheung

  5. Every vote counts • Majority takes all (for each word/tag pair) • If draw, consider previous word/tag pairs • Consider • only the last word/tag pair • all previous pairs (the whole history) • nothing (just randomly pick a tag if draw) LING 572 Bullock & Cheung

  6. Voting - Example • Voting input from three files: • John/NN will/MD join/VB the/DT board/NNP • John/NNP will/MD join/VB the/DT board/NN • John/NNP will/NN join/VB the/DT board/VB • Voting output: • John/NNP will/MD join/VB the/DT board/?? LING 572 Bullock & Cheung

  7. How to count votes? • Use ‘count wins for ALL previous word/tag pairs’ LING 572 Bullock & Cheung

  8. Bagging Effect • Varied # of bags • Observed the effects on • Taggers • Training Data Size LING 572 Bullock & Cheung

  9. Trigram Tagger LING 572 Bullock & Cheung

  10. TBL Tagger LING 572 Bullock & Cheung

  11. MaxEnt Tagger LING 572 Bullock & Cheung

  12. Result Combination LING 572 Bullock & Cheung

  13. Result Combination (1K) LING 572 Bullock & Cheung

  14. Result Combination (5K) LING 572 Bullock & Cheung

  15. Result Combination (10K) LING 572 Bullock & Cheung

  16. Bagging – Further Improvement • Do not use “bad” bags • A “bad” bag: using such a bag decreases the marginal accuracy • Example: Bag 4 and Bag 7 in 5K/10K data • Improve the accuracy by 0.015% - 0.02% LING 572 Bullock & Cheung

  17. Conclusion • Combinations of multiple-tagger results by voting always gave better results than working with a single tagger • When using a single parser and 10 bags, bagging gave better results for the TBL and MaxEnt taggers, but not for the Trigram tagger • Tagging accuracy can be further improved by removing “bad” bags LING 572 Bullock & Cheung

  18. Q & A LING 572 Bullock & Cheung

More Related