Coping with Missing Data for Active Learning

Coping with Missing Datafor Active Learning 02-750 Automation of Biological Research jgc@cs.cmu.edu

What is Missing? • In active learning the category label is missing, and we can query an oracle, mindful of cost • What else can be missing? • Features: we may not have enough for prediction • Feature combinations: beyond those the classifier is able to generate automatically (e.g. XOR, ratios) • Values of features: Not all instances have values for all their features. • Feature relevance: Some features are noisy or irrelevant • Feature redundancy: e.g. high feature co-variance

Reducing the Feature Space • Feature selection • Subsample features using IG, MI, … • Well studied, e.g. Yang & Pedersen ICML 1997 • Wrapper methods • Inefficient but accurate, less studied • Feature projection (to lower dimensions) • LDA, SVD, LSI • Slow, well studied, e.g. Falluchi et al 2009 • Kernel functions on feature sub-spaces

Missing Feature Values • Active learning of features • Not as extensively studied as active instance learning (See Saar-Tsechanskyet al, 2007) • Determines which feature values to seek for given instances, or which features across the board • Can be combined with active instance learning • But, what if there is no oracle? • Impossible to get feature values • Too costly or too time consuming • Do we ignore instances with missing features?

Missing Data

How to Cope with Missing Features • ML training assumes feature completeness • Filter our features that are mostly missing • Filter out instances with missing features • Impute values for missing features • Radically change ML algorithms • When do we do each of the above? • With lots of data and few missing features… • With sparse training data and few missing… • With sparse data and mostly missing features…

Missing Feature Imputation • How do we estimate missing feature values? • Infer the mean value across all instances • Infer the mean value in neighborhood • Apply a classifier with other features as input and missing feature value as y (label) • How do we know if it makes a difference? • Sensitivity analysis (extrema, pertubations) • Train without instances with missing features vs instances with imputed values for missing features

More on Missing Values • Missing Completely at Random (MCAR) • It is generally impossible to prove MCAR or MAR • Missing at Random (MAR) • Statisticians assume MAR as default • Missing values that depend on observables • Imputation via classification/regression • Missing valued that depend on unobservables • Missing depending on the value itself

Imputation – Example[From: Fan 2008] • How to impute the missing SCL for patient # 5? • Sample mean: (3.8 + 0.6 + 1.1 + 1.3)/4 = 1.7 • By age: (3.8+0.6)/2 = 2.2 • By sex: 1.1 • By education: 1.3 • By race: (3.8 + 0.6 + 1.3)/3 = 1.9 • By ADL: (1.1 + 1.3)/2 = 1.2 • Who is/are in the same “slice” with #5?

Further Reading • Saar-Tsechansky& Provost http://www.springerlink.com/content/k5m57475n1658723/fulltext.pdf • Yang, Y., Pedersen J.P. A Comparative Study on Feature Selection in Text CategorizationICML 1997, pp412-420 • Gelman chapter: http://www.stat.columbia.edu/~gelman/arm/missing.pdf • Applications in biomed: Lavori, P., R. Dawson and D. Shera (1995) “A Multiple Imputation Strategy for Clinical TrialswithTruncation of Patient Data.” Statistics in Medicine 14: 1913-1925.

Coping with Missing Data for Active Learning

Coping with Missing Data for Active Learning

Presentation Transcript

MISSING DATA

General Methods for Missing Data

Learning with Missing Data

Active Learning on Spatial Data

Stages of Coping with Data

ACTIVE LEARNING “WITH” TECHNOLOGY

Missing Data

Active Learning for Active Citizenship

Detecting active subnetworks in interaction graphs with missing data

Active Teaching for Active Learning

Unsupervised Learning With Non-ignorable Missing Data

Data Processing with Missing Information

Missing Data

Missing Data

Missing Data

Joint Models with Missing Data for Semi-Supervised Learning

Detecting active subnetworks in metabolic interaction graphs with missing data

Detecting active subnetworks in molecular interaction networks with missing data

Strategies for Coping with Anxiety