400 likes | 638 Views
Course Summary. LING 572 Fei Xia 03/06/07. Outline. Problem description General approach ML algorithms Important concepts Assignments What’s next? . Problem descriptions. Two types of problems. Classification problem Sequence Labeling problem In both cases:
E N D
Course Summary LING 572 Fei Xia 03/06/07
Outline • Problem description • General approach • ML algorithms • Important concepts • Assignments • What’s next?
Two types of problems • Classification problem • Sequence Labeling problem • In both cases: • A predefined set of labels: C = {c1, c2, …cn} • Training data: { (xi, yi) }, where yi2 C, and yi is known or unknown. • Test data
NLP tasks • Classification problems: • Document classification • Spam detection • Sentiment analysis • … • Sequence labeling problems: • POS tagging • Word segmentation • Sentence segmentation • NE detection • Parsing • IGT detection • …
Step 1: Preprocessing • Converting the NLP task to a classification or sequence labeling problem • Creating the attribute-value table: • Define feature templates • Instantiate feature templates and select features • Decide what kind of feature values to use (e.g., binarizing features or not) • Converting a multi-class problem to a binary problem (optional)
Feature selection • Dimensionality reduction • Feature selection • Wrapping methods • Filtering methods: • Mutual info, 2, Information gain, …. • Feature extraction • Term clustering: • Latent semantic indexing (LSI)
Multiclass Binary • One-vs-all • All-pairs • Error-correcting Output Codes (ECOC)
Step 2: Training and decoding • Choose a ML learner • Train and test on development set, with different settings of non-model parameters • Choose the best setting for the development set • Run the learner on the test data with the best setting
Step 3: Post-processing • Label sequence the output we want • System combination • Voting: majority voting, weighted voting • More sophisticated models
Main ideas • kNN and Ricchio: finding the nearest neighbors / prototypes • DT and DL: finding the right group • NB, MaxEnt: calculating P(y | x) • Bagging: Reducing the instability • Boosting: Forming a committee • TBL: Improving the current guess
ML learners • Modeling • Training • Testing (a.k.a. decoding)
Modeling • NB: assuming features are conditionally independent. • MaxEnt:
Training • kNN: no training • Rocchio: calculate prototypes • DT: build a decision tree • Choose a feature and then split data • DL: build a decision list: • Choose a decision rule and then spit data • TBL: build a transformation list by • Choose a transformation and then update the current label field
Training (cont) • NB: calculate P(ci) and P(fj | ci) by simple counting. • MaxEnt: calculate the weights of feature functions by iteration. • Bagging: create bootstrap samples and learn base classifiers. • Boosting: learn base classifiers and their weights.
Testing • kNN: calculate distances between x and xi, find the closest neighbors. • Rocchio: calculate distances between x and prototypes. • DT: traverse the tree • DL: find the first matched decision rule. • TBL: apply transformations one by one.
Testing (cont) • NB: calc • MaxEnt: calc • Bagging: run the base classifiers and choose the class with highest votes. • Boosting: run the base classifiers and calc the weighted sum.
Sequence labeling problems • With classification algorithms: • Having features that refer to previous tags • Using beam search to find good sequences • With sequence labeling algorithms: • HMM • TBL • MEMM • CRF • …
Semi-supervised algorithms • Self-training • Co-training • … Adding some unlabeled data to the labeled data
Unsupervised algorithms • MLE • EM: • General algorithm: E-step, M-step • EM for PM models • Forward-backward for HMM • Inside-outside for PCFG • IBM models for MT
Concepts • Attribute-value table • Feature templates vs. features • Weights: • Feature weights • Classifier weights • Instance weights • Feature values
Concepts (cont) • Maximum entropy vs. Maximum likelihood • Maximize likelihood vs. minimize training error • Training time vs. test time • Training error vs. test error • Greedy algorithm vs. iterative approach
Concepts (cont) • Local optima vs. global optima • Beam search vs. Viterbi algorithm • Sample vs. resample • Model parameters vs. non-model parameters
Assignments • Read code: • NB: binary features? • DT: difference between DT and C4.5 • Boosting: AdaBoost and AdaBoostM2 • MaxEnt: binary features? • Write code: • Info2Vectors • BinVectors • 2 • Complete two projects
Projects • Steps: • Preprocessing • Training and testing • Postprocssing • Two projects: • Project 1: Document classification • Project 2: IGT detection
Project 1: Document classification • A typical classification problem • Data are prepared already • Feature template: word appeared in the doc • Feature value: word frequency
Project 2: IGT detection • Can be framed as a sequence labeling problem • Preprocessing: Define label set • Postprocessing: Tag sequence spans • Sequence labeling problem using classification algorithm with beam search • To use classification classifiers: • Preprocessing: • Define features • Choose feature values • …
Project 2 (cont) • Preprocessing: • Define label set • Define feature templates • Decide on feature values • Training and decoding • Write beam search • Postprocessing • Convert label sequence spans
Project 2 (cont) • Presentation • Final report • A typical conference paper: • Introduction • Previous work • Methodology • Experiments • Discussion • Conclusion
Using Mallet • Difficulties: • Java • A large package • Benefits: • Java • A large package • Many learning algorithms: comparing the implementation with “standard” algorithms
Bugs in Mallet? • In Hw9, include a new section: • Bugs • Complaints • Things you like about Mallet
Course summary • 9 weeks: 18 sessions • 2 kinds of problems • 9 supervised algorithms • 1 semi-supervised algorithm • 1 unsupervised algorithm • 4 related issues: feature selection, multiclass binary, system combination, beam search • 2 projects • 1 well-known package • 9 assignments, including 1 presentation and 1 final report • N papers
What’s the next? • Learn more about the algorithms covered in class. • Learn new algorithms: • SVM, CRF, regression algorithms, graphical models, … • Try new tasks: • Parsing, spam filtering, reference resolution, …
Misc • Hw7: due tomorrow 11pm • Hw8: due Thursday 11pm • Hw9: due 3/13 11pm • Presentation: No more than 15+5 minutes
What must be included in the presentation? • Label set • Feature templates • Effect of beam search • 3+ ways to improve the system and results on dev data (test_data/) • Best system: results on dev data and the setting • Results on test data (more_test_data/)
Grades, etc. • 9 assignments + class participation • Hw1-Hw6: • Total: 740 • Max: 696.56 • Min: 346.52 • Ave: 548.74 • Median: 559.08