80 likes | 244 Views
Mebi 591D – BHI Kaggle Class. Baselines http://winter2014-mebi591d-kaggleclass.weebly.com/. Baseline (I.). What is a baseline for? (a) a reasonable 1 st approach to your problem (b) meant to be quick and to get system running (c) allow you to see improvements What should be included?
E N D
Mebi 591D –BHI Kaggle Class Baselines http://winter2014-mebi591d-kaggleclass.weebly.com/
Baseline (I.) • What is a baseline for? • (a) a reasonable 1st approach to your problem • (b) meant to be quick and to get system running • (c) allow you to see improvements • What should be included? • (a) your system should be able to take in any test set and output your prediction • (b) you should be able to give evaluation scores on any test set presented • (c) you should be able to visualize which instances your errors occur in Due in 4 weeks: Start early!
Baseline (II.) • Examples of how to use your baseline • Case 1. Named-entity recognition task, choose to use sequential CRF implementation • Baseline: use unigram features • Further experiments: bigram features, POS, etc • Change to 2-step classification, change tagging
Baseline (II.) • Case 2. Predict stock market price • Baseline: HMM – previous stock price same time • Further experiments: add derivative features, add features from news • Can try several other classifications • Can use some kind of boosting algorithm
Evaluation Metrics (I.) • RECAP from last time --- never evaluate on test set when building your system -- why? • You are cheating! • Overtraining on mistakes and noise (won’t generalize) • Using a development set or cross-validation • A development set is another set you split out just like the test set (~10%) • Used to evaluate • Used for tuning parameters • Cross-validation sets • Split data to N pieces, use N-1 pieces as training, 1 as test, then repeat Nx to get variations of scores
Evaluation Metrics (II.) • Multi-class categorization • Precision, recall, f1-score • AUC curve • Why may these not measure things well? • Class imblance! • Use micro- and macro- definitions • Numeric predictions • RMS-error • Nearest neighbor error
Error analysis • Good to see where your system makes error so you can introduce better features (or a better model) • Good to see where you are getting false positives and false negatives • Confusion matrices for classification are helpful • It’s a (n label)x(n label) matrix where rows/columns represent gold and system predictions • Numbers in matrix represent counts
Tasks • Decide strategy(next week) • What is baseline • How work will be divided • What resources you will use • Baseline system (4 weeks) • Include prediction module • Includes evaluation module • Be able to visualize your errors for error analysis