1 / 22

Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection

This paper presents a method for efficiently estimating density ratios to adapt to nonstationary environments and detect outliers. The proposed approach utilizes direct importance sampling to reduce variance and improve estimation accuracy. Experimental results demonstrate the effectiveness of the method in various applications, such as covariate shift adaptation and outlier detection.

brina
Download Presentation

Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Direct Density Ratio Estimation forNon-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008

  2. Outline • Motivation • Importance Estimation • Direct Importance Estimation • Approximation Algorithm • Experiments • Conclusions

  3. Motivation • Importance Sampling • Covariate Shift • Outlier Detection

  4. Importance Sampling • Rather than sampling from the distribution p, importance sampling is to reduce the variance of Ê[f(X)] by an appropriate choice of q, hence the name importance sampling, as samples from q can be more "important" for the estimation of the integral. • Other reasons include difficulties to draw samples from distribution p or efficiency considerations. [2] R. Srinivasan, Importance sampling - Applications in communications and detection, Springer-Verlag, Berlin, 2002. [3]P. J.Smith, M.Shafi, and H. Gao, "Quick simulation: A review of importance sampling techniques in communication systems," IEEE J.Select.Areas Commun., vol. 15, pp. 597-613, May 1997.

  5. Covariate Shift Distribution of input training and testing set changed, while the conditional distribution that output given input unchanged. Then, standard learning techniques such as MLE or CV are biased. Compensated by weighting the training samples according to the importance [4]Jiayuan Huang, Alexander J. Smola,Arthur Gretton,et al. Correcting Sample Selection Bias by Unlabeled Data, NIPS 2006.

  6. Outlier Detection • The importance for regular samples are close to one, while those for outliers tend to be significantly deviated from one. • The values of the importance could be used as an index of the degree of outlyingness.

  7. Related Works • Kernel Density Estimation • Kernel Mean Matching a map into the feature space the expectation operator μ(Pr) := Ex~Pr(x)[Φ(x)] . [4]Jiayuan Huang, Alexander J. Smola,Arthur Gretton,et al. Correcting Sample Selection Bias by Unlabeled Data, NIPS 2006.

  8. Direct Importance Estimation

  9. Least-square Approach • Model w(x) with linear model • Determine the parameter alpha so that the squared error on training samples is minimized:

  10. Least Square Importance FittingLSIF Empirical estimation Regularization term to avoid over-fitting

  11. Model Selection for LSIF • Model the parameter lambda, the basis function phi • Model selection Cross Validation

  12. Heuristics for Basic function Design • Gaussian kernel centered at the test samples

  13. Unconstrained Least-squares Approach (uLSIF) • Ignore the non-negativity constraints • Learned parameters could be negative • To compensate for the approximation error, modify the solution

  14. Efficient Computation of LOOCV • Samples • learned without the • LOOCV score • According to the Sherman-Woodbury-Morrison formula, the matrix inverse needs to be computed only once.

  15. ExperimentsImportance Estimation • ptrain is the d-dimensional normal distribution with mean zero and covariance identity. • ptest is the d-dimensional normal distribution with mean (1,0,…,0)T and covariance identity. • Normalized mean squared error

  16. Covariate Shift Adaptationin classification and regression • Given the training samples, the test samples, and the outputs of the training samples • The task is to predict the outputs for test samples

  17. Experimental Description • Divide the training samples into R disjoint subsets • The function is learned using by IWRLS and its mean test error for the remaining samples is computed: Where

  18. Covariate shift adaptation

  19. ExperimentOutlier Detection

  20. Conclusions • Application • Covariate shift adaptation • Outlier detection • Feature selection • Conditional distribution estimation • ICA • ……

  21. Reference • [1]Takafumi Kanamori, Shohei Hido. Efficient direct density ratio estimation for non-stationarity adaptation and outlier detection, NIPS 2008. • [2] R. Srinivasan, Importance sampling - Applications in communications and detection, Springer-Verlag, Berlin, 2002. • [3]P. J.Smith, M.Shafi, and H. Gao, "Quick simulation: A review of importance sampling techniques in communication systems," IEEE J.Select.Areas Commun., vol. 15, pp. 597-613, May 1997. • [4]Jiayuan Huang, Alexander J. Smola,Arthur Gretton,et al. Correcting Sample Selection Bias by Unlabeled Data, NIPS 2006. • [5] Jing Jiang. A Literature Survey on Domain Adaptation of Statistical Classifiers

More Related