400 likes | 538 Views
Anti-Learning. Adam Kowalczyk Statistical Machine Learning NICTA, Canberra (Adam.Kowalczyk@nicta.com.au). National ICT Australia Limited is funded and supported by:. 1. Anti-learning Elevated XOR Natural data Predicting Chemo-Radio-Therapy (CRT) response for Oesophageal Cancer
E N D
Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra (Adam.Kowalczyk@nicta.com.au) National ICT Australia Limited is funded and supported by: 1
Anti-learning Elevated XOR Natural data Predicting Chemo-Radio-Therapy (CRT) response for Oesophageal Cancer Classifying Aryl Hydrocarbon Receptor genes Synthetic data High dimensional mimicry Conclusions Appendix: A Theory of Anti-learning Perfect anti-learning Class-symmetric kernels Overview
Definition of anti-learning Systematically: Random guessing accuracy Off-training accuracy Training accuracy
z y -1 +1 +1 x -1 Anti-learning in Low Dimensions -1 +1 +1 -1
Anti-Learning Learning
Area under Receiver Operating Characteristic (AROC) θ f Evaluation Measure 1 f 0.5 True Positive AROC( f ) 0 0 0.5 1 False Positive
? AROC Test Training 1 AROC Learning TP + 1 AROC 0 0 1 TP FN 1 0 0 1 FN + TP Anti-learning 0 0 1 FN Random: AROC = 0.5 Learning and anti-learning mode of supervised classification
Random: AROC = 0.5 Learning and anti-learning mode of supervised classification Test Training 1 AROC Learning TP + 1 AROC 0 0 1 TP FN 1 0 0 1 FN + TP Anti-learning AROC 0 0 1 FN
KDD’02 task: identification of Aryl Hydrocarbon Receptor genes (AHR data)
Anti-learning in AHR-data set from KDD Cup 2002 Average of 100 trials; random splits: training: test = 66% : 34%
Single class SVM 38/84 training examples 1.3/2.8% of data used in ~14,000 dimensions - change or control Vogel- AI Insight - change KDD Cup 2002 Yeast Gene Regulation Prediction Taskhttp://www.biostat.wisc.edu/~craven/kddcup/task2.ppt
high dimensional features Paradox of High Dimensional Mimicry • If detection is based of large number of features, • the imposters are samples from a distribution with the marginals perfectly matching distribution of individual features for a finite genuine sample, then • imposters are be perfectly detectable by ML-filters in the anti-learning mode
d = 5000 = | nE| / |nX| Quality of mimicry d = 1000 = | nE| / |nX| Average of independent test for of 50 repeats
: Formal result
Proof idea 1:Geometry of the mimicry data Key Lemma:
1 Train ROCT Test ROCS-T True positive 1 False positive Perfect learning/anti-learning for CS-kernels Kowalczyk & Chapelle, ALT’ 05
Perfect learning/anti-learning for CS-kernels Kowalczyk & Chapelle, ALT’ 05
Perfect anti-learning theorem Kowalczyk & Smola, Conditions for Anti-Learning
Anti-learning in classification of Hadamard dataset Kowalczyk & Smola, Conditions for Anti-Learning
AHR data set from KDD Cup’02 Kowalczyk & Smola, Conditions for Anti-Learning Kowalczyk, Smola, submitted
From Anti-learning to learning Class Symmetric CS– kernel case Kowalczyk & Chapelle, ALT’ 05
More is not necessarily better! Perfect anti-learning :i.i.d.a learning curve n = 100, nRand = 1000 random AROC: mean ± std 2 0 1 4 5 3 nsamplesi.i.d. samples from the perfect anti-learning-set S
Statistics and machine learning are indispensable components of forthcoming revolution in medical diagnostics based on genomic profiling High dimensionality of the data poses new challenges pushing statistical techniques into uncharted waters Challenges of biological data can stimulate novel directions of machine learning research Conclusions
Telstra Bhavani Raskutti Peter MacCallum Cancer Centre David Bowtell Coung Duong Wayne Phillips MPI Cheng Soon Ong Olivier Chapelle NICTA Alex Smola Acknowledgements