100 likes | 305 Views
Coping with Missing Data for Active Learning. 02-750 Automation of Biological Research jgc@cs.cmu.edu. What is Missing?. In active learning the category label is missing, and we can query an oracle, mindful of cost What else can be missing? Features: we may not have enough for prediction
E N D
Coping with Missing Datafor Active Learning 02-750 Automation of Biological Research jgc@cs.cmu.edu
What is Missing? • In active learning the category label is missing, and we can query an oracle, mindful of cost • What else can be missing? • Features: we may not have enough for prediction • Feature combinations: beyond those the classifier is able to generate automatically (e.g. XOR, ratios) • Values of features: Not all instances have values for all their features. • Feature relevance: Some features are noisy or irrelevant • Feature redundancy: e.g. high feature co-variance
Reducing the Feature Space • Feature selection • Subsample features using IG, MI, … • Well studied, e.g. Yang & Pedersen ICML 1997 • Wrapper methods • Inefficient but accurate, less studied • Feature projection (to lower dimensions) • LDA, SVD, LSI • Slow, well studied, e.g. Falluchi et al 2009 • Kernel functions on feature sub-spaces
Missing Feature Values • Active learning of features • Not as extensively studied as active instance learning (See Saar-Tsechanskyet al, 2007) • Determines which feature values to seek for given instances, or which features across the board • Can be combined with active instance learning • But, what if there is no oracle? • Impossible to get feature values • Too costly or too time consuming • Do we ignore instances with missing features?
How to Cope with Missing Features • ML training assumes feature completeness • Filter our features that are mostly missing • Filter out instances with missing features • Impute values for missing features • Radically change ML algorithms • When do we do each of the above? • With lots of data and few missing features… • With sparse training data and few missing… • With sparse data and mostly missing features…
Missing Feature Imputation • How do we estimate missing feature values? • Infer the mean value across all instances • Infer the mean value in neighborhood • Apply a classifier with other features as input and missing feature value as y (label) • How do we know if it makes a difference? • Sensitivity analysis (extrema, pertubations) • Train without instances with missing features vs instances with imputed values for missing features
More on Missing Values • Missing Completely at Random (MCAR) • It is generally impossible to prove MCAR or MAR • Missing at Random (MAR) • Statisticians assume MAR as default • Missing values that depend on observables • Imputation via classification/regression • Missing valued that depend on unobservables • Missing depending on the value itself
Imputation – Example[From: Fan 2008] • How to impute the missing SCL for patient # 5? • Sample mean: (3.8 + 0.6 + 1.1 + 1.3)/4 = 1.7 • By age: (3.8+0.6)/2 = 2.2 • By sex: 1.1 • By education: 1.3 • By race: (3.8 + 0.6 + 1.3)/3 = 1.9 • By ADL: (1.1 + 1.3)/2 = 1.2 • Who is/are in the same “slice” with #5?
Further Reading • Saar-Tsechansky& Provost http://www.springerlink.com/content/k5m57475n1658723/fulltext.pdf • Yang, Y., Pedersen J.P. A Comparative Study on Feature Selection in Text CategorizationICML 1997, pp412-420 • Gelman chapter: http://www.stat.columbia.edu/~gelman/arm/missing.pdf • Applications in biomed: Lavori, P., R. Dawson and D. Shera (1995) “A Multiple Imputation Strategy for Clinical TrialswithTruncation of Patient Data.” Statistics in Medicine 14: 1913-1925.