190 likes | 200 Views
This research addresses the challenge of dataset shift in satellite image classification, proposing a method to retrain classifiers using low-rank models to better match test data. By estimating parameters with a low-rank approach, spectral differences between classes are maintained, improving classification accuracy in optical images and forest cover mapping. Experimental results demonstrate successful cloud detection and tree cover mapping applications, highlighting the potential of this method in overcoming dataset shift issues.
E N D
Retraining maximum likelihood classifiers using a low-rank model Arnt-Børre Salberg Norwegian Computing Center Oslo, Norway IGARSS July 25, 2011
Introduction • Challenge: Dataset shift problem: • Training data match the test data poorly due to atmospherical, geographical, botanical and phenological variations in the image data → reduced classification performance • Class-dependent data distribution varies • between training images • between test and training images • Goal: Develop a method that re-estimates the parameters such that classifier possess a good fit to the test data
Introduction • Many surface reflectance algorithms often requires data from external sources • LEDAPS (Landsat): • ozone and water vapor measurements • Phenological, botanical and geographical variation in addition to atmospherical makes the calibration problem even harder
An existing method… • Models the test image as a mixture distribution and estimates all parameters using the EM-algorithm, with estimated parameters from training data as initial values • To many degrees of freedom. Statistic fit is excellent, but class labels get mixed.
Low-rank parameter modeling • Training image k: • Class mean vector and covariance matrix (class i) • Class mean vector and covariance matrix model for the test image • a and b are unknown parameter vectors to be estimated from the data
Low-rank data modeling • The proposed method for modeling the test data is a low-rank approach since the number of parameters in ais L<D. • This is much less than estimating all C·D parameters i mi, i=1,…,C • By using a low-rank estimation of the class mean vectors of the test data, the spectral differences between the classes is in larger degree maintained
Parameter estimation • Procedure for estimating a and b: • Select N random samples {y1, y2,… yN}from the test image
Parameter estimation • Procedure for estimating a and b: • Select N random samples {y1, y2,… yN}from the test image • Model them using a Gaussian mixture distribution • Estimate the parameters by solving the likelihood
Experiment 1:Cloud detection in optical images • 15 different QuickBird and WorldView-2 images covering 7 different scenes in Norway • Features • Band 2 (green) • Band 3 (red) • Classes • clouds, cloud shadows, vegetation, concrete/asphalt/etc., haze and water • Resolution down-sampled to 19.2 m (16.0 m) • 4 different training (sub)images
Experiment 1:Cloud detection in optical images • Model di is the eigenvector corresponding to the largest eigenvalue ni of the matrix eigenvector Test average
Experiment 1:Cloud detection in optical images • Parameter estimation. At iteration l+1: where
Results:Cloud detection in optical images Without retraining With retraining
Experiment 2:Tree cover mapping of tropical forest • 13 different Landsat TM images covering an area nearby Amani, Tanzania (path/row 166/063) • Features • Band 1-5 and 7 • Classes • Forest, spares forest, grass and soil • Two training images (1986-10-06 and 2010-02-10)
Experiment 2:Tree cover mapping of tropical forest • Model a constrained to contain only positive elements • Solution found using non-negative least-squares in combination with iterative maximum-likelihood estimation
Experiment 2:Tree cover mapping of tropical forest • Parameter estimation: At iteration l+1 where
Results:Tree cover mapping of tropical forest • * July 2009 February 2010 Without retraining With retraining
Summary and conclusion • Proposed a simple method for handling the dataset shift between training and test data • Cloud detection: Evaluated successfully on a many different Quickbird and WorldView-2 images. • Haze versus clouds • Confuses snow and clouds • Guidelines on how to select the low-rank modeling functions is needed • EM-algorithm and local minima problem • More testing and evalidation of the method is necessary