440 likes | 598 Views
Presenter: Russell Greiner. Learning to Predict. Vision Statement. Helping the world understand data … and make informed decisions. … and make informed decisions . Single decision: determine class label of an instance set of labels of set of pixels, …
E N D
Presenter: Russell Greiner Learning to Predict
Vision Statement Helping the world understand data … and make informed decisions. … and make informed decisions. • Single decision: determine • class label of an instance • set of labels of set of pixels, … • value of a property of an instance, …
Motivation for Training a Predictor • Need to know “label” of an instance,to determine appropriate action • PredictorMed( patient#2 ) =?“treatX is Ok” • Unfortunately, Predictor( . )not known a priori • But many examples of patient, treatX Predictor Ok
Learner Motivation for Training a Predictor • Machine learning provide alg’s for mapping { patient, treatX } to Predictor(.)function Temp. Press. Sore Throat … Colour treatX 35 95 Y … Pale No 22 110 N … Clear Ok : : : : 10 87 N … Pale No Predictor treatX Temp Press. Sore- Throat … Color Ok 32 90 N … Pale
Learner Motivation for Training a Predictor • Need to learn (not program it in) when predictor is … • … not known • … not expressible • … changing • … user dependent Temp. Press. Sore Throat … Colour treatX 35 95 Y … Pale No 22 110 N … Clear Ok : : : : 10 87 N … Pale No Predictor treatX Temp Press. Sore- Throat … Color No 32 90 N … Pale
Personnel • PI synergy: • Greiner, Schuurmans, Holte, Sutton, Szepesvari, Goebel • 5 Postdocs • 16 Grad students (5 MSc, 11 PhD) • 5 Supporting technical staff + personnel for Bioinformatics thrust
Partners/Collaborators • 4 UofA CS profs • 1 UofAlberta Math/Stat • Non-UofA collaborators: Google, Yahoo!, Electronic Arts, UofMontreal, UofWaterloo, UofNebraska, NICTA, NRC-IIT,… + Bioinformatics thrust collaborators
Additional Resources • Grants • $225K CFI • $100K MITACS • $100K Google • Hardware • 68 processor, 2TB, Opteron Cluster • 54 processor, dual core, 1.5TB, Opteron Cluster + funds/data for Bioinformatics thrust
Highlights • IJCAI 2005 – Distinguished Paper Prize • UM 2003 – Best Student Paper Prize • WebIC technology is foundation for start-up company • Significant advances in extending SVMs to use Un-supervised/Semi-supervised data, and for structured data + Highlights from Bioinformatics thrust
Temp. Press Sore Throat … Colour treatX 35 95 Y … Pale No 22 110 N … Clear Ok : : : : 10 87 N … Pale No Learner Predictor treatX Temp Press. Sore- Throat … Color No 32 90 N … Pale Learning to Predict: Challenges Simplifying assumptions re: training data • IID / unstructured • Lots of instances • Low dimensions • Complete features • Completely labeled • Balanced data • is sufficient
Segmenting Brain Tumors Learning to Predict: Challenges Simplifying assumptions re: training data • IID / unstructured ? • Lots of instances • Low dimensions • Complete features • Completely labeled • Balanced data • is sufficient Extensions to Conditional Random Fields, …
Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances ? • Low dimensions? • Complete features • Completely labeled • Balanced data • is sufficient N 10’s m 1000’s
Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances ? • Low dimensions? • Complete features • Completely labeled • Balanced data • is sufficient N 20,000 m100 Microarray, SNP Chips, … Dimensionality Reduction … L 2 Model: Component Discovery BiCluster Coding
Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances • Low dimensions • Complete features ? • Completely labeled • Balanced data • is sufficient Budget Learning
Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances • Low dimensions • Complete features • Completely labeled ? • Balanced data • is sufficient SemiSupervised Learning Active Learning
Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances • Low dimensions • Complete features • Completely labeled • Balanced data ? • is sufficient Cost Curves (analysis)
Learning to Predict: Challenges Simplifying assumptions re training data • IID / unstructured • Lots of instances • Low dimensions • Complete features • Completely labeled • Balanced data • is sufficient ? Robust SVM Mixture Using Variance Large Margin Bayes Net Coordinate Classifier …
Projects and Status • Structured Prediction • Random Fields • Parsing • Unsupervised M3N • Dimensional Reduction • (L 2 Model: Component Discovery) • Budgeted Learning • SemiSupervised Learning • large-margin (SVM) • probabilistic (CRF) • graph based transduction • Active Learning • CostCurves • Robust SVM • Coordinated Classifiers • Mixture Using Variance • Large Margin Bayes Net IID / unstructured Lots of instances Low dimensions Complete features Completely labeled Balanced data Beyond simple learners Poster # 26
Technical Details Budgeted Learning
Response Learner Predictor Typical Supervised Learning Person 1 Person 2
Response Learner Predictor ActiveLearning Person 1 Person 2 User is able to PURCHASE labels, at some cost … for which instances??
Response Learner Predictor BudgetedLearning Person 1 Person 2 User is able to PURCHASE values of features, at some cost … but which features for which instances??
Response Learner Predictor BudgetedLearning Person 1 Person 2 User is able to PURCHASE values of features, at some cost … but which features for which instances?? Significantly different from ACTIVE learning: • correlations between feature values
10 tests ($1/test) Budget =$40 Beta(10,1) Error # features purchased
Budgeted Learning… so far • Defined framework • Ability to purchase individual feature values • Fixed LEARNING / CLASSIFICATION Budget • Theoretical results • NP-hard in general • Standard algorithms not even Approx ! • Empirical Results show … • Avoid Round Robin • Try clever algorithms • Biased Robin • Randomized Single Feature Lookahead [Lizotte,Madani,Greiner: UAI’03], [Madani,Lizotte,Greiner: UAI’04], [Kapoor,Greiner: ECML’05]
Response Learner Classifier Future Work #1 Person 1 Person 2
Future Work #2 • Sample complexity of Budgeted Learning • How many (Ij, Xi)“probes” required to PAC-learn ? • Develop policies withguaranteeson learning performance • Complex cost model…Bundling tests, … • Allow learner to perform more powerful probes • purchase X3 in instance where X7 = 0& Y = 1 • More complex classifiers ?
Response Future Work #3 Learning Generative Model Person 1 Person 2 Goal: Find * = argmax P(D)
MTest MTrain Labels BiCluster Membership 0 1 .. 1 + 1 1 .. 0 Learner – Find BiClusters 1 0 … 1 – 1 1 … 0 + – 0 0 … 1 Classifier + 1 1 … 0 – Projects and Status • Structured Prediction(ongoing) • Dimensional Reduction: (ongoing; RoBiC: Poster#8) • Budgeted Learning(ongoing) • SemiSupervised Learning (ongoing) • Active Learning (ongoing) • CostCurves (complete; Post#26)
Technical Details Using Variance Estimates to Combine Bayesian Classifiers
C2 o + o + + o + + o o o + C1 o o + + + o o + + + o + o + + o o o + o + + o + + o + * o + + o o C3 o + + § C4 o o + + + o Motivation • Spse many different classifiers … • For each instance, want each classifier to… • “know what it knows”… • … and shout LOUDEST when it knows best… • “Loudness” 1/ Variance !
Mixture Using Variance • Given belief net classifier • fixed (correct) structure • parameters estimated from (random) datasample • Response to query “P(+c| -e, +w)” is… • asymptotically normal with … • (asymptotic) variance • Variance easy to compute … • for simple structures (Naïve Bayes, TAN) … and • for complete queries
Experiment #4b:MUV(kNB, Adaboost, js) vs AdaBoost(NB) • MUV significantly out-performs AdaBoost • even when using base-classifiers that AdaBoost generated! MUV(kNB, AdaBoost, js) better than AdaBoost[NB] with p < 0.023
MUV Results • Sound statistical foundation • Very effective classifier … • …across many real datasets • MUV(NB) better than AdaBoost(NB)! C. Lee, S. Wang and R. Greiner; ICML’06
Mixture Using Variance … next steps? • Other structures (beyond NB, TAN) • Beyond just tabular CPtables for discrete variables • Noisy-or • Gaussians • Learn different base-classifiers from different subset of features • Scaling up to many MANY features • overfitting characteristics?
Confidence in Classifier • Confidence of Prediction? • Fit each j, j2 to Beta(aji, bj) • Compute area CDFBeta(aj, bj)(0.5)
Labeled Training Data UnLabeled Training Data Learner Semi-Supervised Learning Classifier No
Approaches • Ignore the unlabeled data • Great if have LOTS of labeled data • Use the unlabeled data, as is… • “Semi-Supervised Learning”… based on • large margin (SVM) • graph • probabilistic model • Pay to get labels for SOME unlabeled data • “Active Learning”
Semi-supervised Multi-class SVM • Approach: find a labeling that would yield an optimal SVM classifier, on the resulting training data. • Hard, but • semi-definite relaxations can approximate this objective surprisingly well • training procedures are computationally intensive, but produce high quality generalization results. L. Xu, J. Neufeld, B. Larson, D. Schuurmans. Maximum margin clustering. NIPS-04. L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class SVMs. AAAI-05.
Probabilistic Approach to Semi-Supervised Learning • Probabilistic model: P(y|x) • Context: non-IID data • Language modelling • Segmenting Brain Tumor from MR Images • Use Unlabeled Data as Regularizer • Future: Other applications… C-H. Lee, W. Shaojun, F. Jiao, D. Schuurmans and R. Greiner. Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields. NIPS06. F. Jiao, S. Wang, C. Lee, R. Greiner, and D. Schuurmans. Semi-supervised conditional random fields for improved sequence segmentation and labeling. COLING/ACL06.
Active Learning • Pay for label to query xi that ... maximizes conditional mutual information about unlabeled data: • How to determine yi ? • Take EXPECTATION wrtYi ? • Use OPTIMISTIC guess wrt Yi ?
Optimistic Active Learning using Mutual Information • Need Optimism • Need “on-line adjustment” • Better than just MostUncertain, … breast pima Y. Guo and R. Greiner. Optimistic active learning using mutual information. IJCAI’07
Future Work on Active Learning • Understand WHY “optimism” works… + other applications of optimism • Extend framework to deal with • non-iid data • different qualities of labelers • …