1 / 40

Course Summary

Course Summary. LING 572 Fei Xia 03/06/07. Outline. Problem description General approach ML algorithms Important concepts Assignments What’s next? . Problem descriptions. Two types of problems. Classification problem Sequence Labeling problem In both cases:

Mercy
Download Presentation

Course Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Course Summary LING 572 Fei Xia 03/06/07

  2. Outline • Problem description • General approach • ML algorithms • Important concepts • Assignments • What’s next?

  3. Problem descriptions

  4. Two types of problems • Classification problem • Sequence Labeling problem • In both cases: • A predefined set of labels: C = {c1, c2, …cn} • Training data: { (xi, yi) }, where yi2 C, and yi is known or unknown. • Test data

  5. NLP tasks • Classification problems: • Document classification • Spam detection • Sentiment analysis • … • Sequence labeling problems: • POS tagging • Word segmentation • Sentence segmentation • NE detection • Parsing • IGT detection • …

  6. General approach

  7. Step 1: Preprocessing • Converting the NLP task to a classification or sequence labeling problem • Creating the attribute-value table: • Define feature templates • Instantiate feature templates and select features • Decide what kind of feature values to use (e.g., binarizing features or not) • Converting a multi-class problem to a binary problem (optional)

  8. Feature selection • Dimensionality reduction • Feature selection • Wrapping methods • Filtering methods: • Mutual info, 2, Information gain, …. • Feature extraction • Term clustering: • Latent semantic indexing (LSI)

  9. Multiclass  Binary • One-vs-all • All-pairs • Error-correcting Output Codes (ECOC)

  10. Step 2: Training and decoding • Choose a ML learner • Train and test on development set, with different settings of non-model parameters • Choose the best setting for the development set • Run the learner on the test data with the best setting

  11. Step 3: Post-processing • Label sequence  the output we want • System combination • Voting: majority voting, weighted voting • More sophisticated models

  12. Supervised algorithms

  13. Main ideas • kNN and Ricchio: finding the nearest neighbors / prototypes • DT and DL: finding the right group • NB, MaxEnt: calculating P(y | x) • Bagging: Reducing the instability • Boosting: Forming a committee • TBL: Improving the current guess

  14. ML learners • Modeling • Training • Testing (a.k.a. decoding)

  15. Modeling • NB: assuming features are conditionally independent. • MaxEnt:

  16. Training • kNN: no training • Rocchio: calculate prototypes • DT: build a decision tree • Choose a feature and then split data • DL: build a decision list: • Choose a decision rule and then spit data • TBL: build a transformation list by • Choose a transformation and then update the current label field

  17. Training (cont) • NB: calculate P(ci) and P(fj | ci) by simple counting. • MaxEnt: calculate the weights of feature functions by iteration. • Bagging: create bootstrap samples and learn base classifiers. • Boosting: learn base classifiers and their weights.

  18. Testing • kNN: calculate distances between x and xi, find the closest neighbors. • Rocchio: calculate distances between x and prototypes. • DT: traverse the tree • DL: find the first matched decision rule. • TBL: apply transformations one by one.

  19. Testing (cont) • NB: calc • MaxEnt: calc • Bagging: run the base classifiers and choose the class with highest votes. • Boosting: run the base classifiers and calc the weighted sum.

  20. Sequence labeling problems • With classification algorithms: • Having features that refer to previous tags • Using beam search to find good sequences • With sequence labeling algorithms: • HMM • TBL • MEMM • CRF • …

  21. Semi-supervised algorithms • Self-training • Co-training • …  Adding some unlabeled data to the labeled data

  22. Unsupervised algorithms • MLE • EM: • General algorithm: E-step, M-step • EM for PM models • Forward-backward for HMM • Inside-outside for PCFG • IBM models for MT

  23. Important concepts

  24. Concepts • Attribute-value table • Feature templates vs. features • Weights: • Feature weights • Classifier weights • Instance weights • Feature values

  25. Concepts (cont) • Maximum entropy vs. Maximum likelihood • Maximize likelihood vs. minimize training error • Training time vs. test time • Training error vs. test error • Greedy algorithm vs. iterative approach

  26. Concepts (cont) • Local optima vs. global optima • Beam search vs. Viterbi algorithm • Sample vs. resample • Model parameters vs. non-model parameters

  27. Assignments

  28. Assignments • Read code: • NB: binary features? • DT: difference between DT and C4.5 • Boosting: AdaBoost and AdaBoostM2 • MaxEnt: binary features? • Write code: • Info2Vectors • BinVectors • 2 • Complete two projects

  29. Projects • Steps: • Preprocessing • Training and testing • Postprocssing • Two projects: • Project 1: Document classification • Project 2: IGT detection

  30. Project 1: Document classification • A typical classification problem • Data are prepared already • Feature template: word appeared in the doc • Feature value: word frequency

  31. Project 2: IGT detection • Can be framed as a sequence labeling problem • Preprocessing: Define label set • Postprocessing: Tag sequence  spans • Sequence labeling problem  using classification algorithm with beam search • To use classification classifiers: • Preprocessing: • Define features • Choose feature values • …

  32. Project 2 (cont) • Preprocessing: • Define label set • Define feature templates • Decide on feature values • Training and decoding • Write beam search • Postprocessing • Convert label sequence  spans

  33. Project 2 (cont) • Presentation • Final report • A typical conference paper: • Introduction • Previous work • Methodology • Experiments • Discussion • Conclusion

  34. Using Mallet • Difficulties: • Java • A large package • Benefits: • Java • A large package • Many learning algorithms: comparing the implementation with “standard” algorithms

  35. Bugs in Mallet? • In Hw9, include a new section: • Bugs • Complaints • Things you like about Mallet

  36. Course summary • 9 weeks: 18 sessions • 2 kinds of problems • 9 supervised algorithms • 1 semi-supervised algorithm • 1 unsupervised algorithm • 4 related issues: feature selection, multiclass  binary, system combination, beam search • 2 projects • 1 well-known package • 9 assignments, including 1 presentation and 1 final report • N papers

  37. What’s the next? • Learn more about the algorithms covered in class. • Learn new algorithms: • SVM, CRF, regression algorithms, graphical models, … • Try new tasks: • Parsing, spam filtering, reference resolution, …

  38. Misc • Hw7: due tomorrow 11pm • Hw8: due Thursday 11pm • Hw9: due 3/13 11pm • Presentation: No more than 15+5 minutes

  39. What must be included in the presentation? • Label set • Feature templates • Effect of beam search • 3+ ways to improve the system and results on dev data (test_data/) • Best system: results on dev data and the setting • Results on test data (more_test_data/)

  40. Grades, etc. • 9 assignments + class participation • Hw1-Hw6: • Total: 740 • Max: 696.56 • Min: 346.52 • Ave: 548.74 • Median: 559.08

More Related