1 / 76

Study of Sparse Online Gaussian Process for Regression

Study of Sparse Online Gaussian Process for Regression. EE645 Final Project May 2005 Eric Saint Georges. Contents. Introduction OGP Definition of Gaussian Process Sparse Online GP algorithm (OGP) Simulation Results Comparison with LS SVM on Boston Housing data set (Batch)

scottandrew
Download Presentation

Study of Sparse Online Gaussian Process for Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Study ofSparse Online Gaussian Processfor Regression EE645 Final Project May 2005 Eric Saint Georges

  2. Contents • Introduction • OGP • Definition of Gaussian Process • Sparse Online GP algorithm (OGP) • Simulation Results • Comparison with LS SVM on Boston Housing data set (Batch) • Time Series Prediction using OGP • Optical Beam Position Optimization • Conclusion

  3. Introduction Possible Application of OGP to Optical Free Space Communication for Monitoring and Optimization in a noisy environment Using Sparse OGP Algorithm developed by Lehel Csato and al.

  4. Contents • Introduction • OGP • Definition of Gaussian Process • Sparse Online GP algorithm (OGP) • Simulation Results • Comparison with LS SVM on Boston Housing data set (Batch) • Time Series Prediction using OGP • Optical Beam Position Optimization • Conclusion

  5. Gaussian Process Definition Collection of indexed random variables • Mean • Covariance defined by a Kernel function • function: can be any Positive Semi Definite function • Defines the assumptions on the prior distribution • Wide scope of choices • Popular Kernels are stationary functions: f (x-x’) • Index can be time or space or anything else

  6. On line GP Process • Bayesian Process: Prior distribution (GP Process) + Likelihood Function Posterior distribution (Using Bayes rule)

  7. Solving a Gaussian Process: Given measures n inputs and n measures ti (with ti = yi + ei) being zero mean and se variance Prior distribution over yi is given by the covariance matrix Kij = C(xi,xj). Prior distribution over the measures ti is given by K+ se In Prediction on function y* for an input x* consists in calculating the mean and variances: y*(x* )= S ai C(xi,xj) and s(x* )=C(x*,x*) –kT(x*)(K + se In )-1 k (x*) With ai = (K + se In )-1 t

  8. Solving the Gaussian Process: Solving requires inversion of (K + se In ) which is a n x n matrix, n being the number of training inputs. Memory is in n2 and cpu time in n3.

  9. Sampling from a Gaussian Process: • Example of Kernel: a = amplitude s = scale (smoothness)

  10. Sampling from a GP: before Training

  11. Sampling from a GP: before Training Effect of Scale Small Scale=1 Large Scale =100

  12. Sampling from a GP: before Training Effect of Scale Small Scale=1 Large Scale =100

  13. Sampling from a GP: After Training After 3 Training samples

  14. Sampling from a GP: After Training After 10 Training samples

  15. Online Gaussian Process: Issues Two Major Issues with the GP process; • Data Set size is limited by Memory and CPU • Posterior distribution is usually not Gaussian

  16. Sparse Online Gaussian Algorithm Algorithm developed by Csato and al. Posterior Distribution Not usually Gaussian Data Set size limited by Memory and CPU Sparsity created using limited number of SVs Gaussian Approximation Matlab Software available on the Web

  17. Sparse Online Gaussian Algorithm SOGP Process defined by: • Kernel Parameters m + 2 Vector for RBF Kernel • Support Vectors: d x 1 Vector (indexes) • GP Parameters: • a: d x 1 Vector • K: d x n Matrix • m dimension of input space • d number of support vectors

  18. Introduction • OGP • Definition of Gaussian Process • Sparse Online GP algorithm (SOGP) • Simulation Results • Comparison with LS SVM on Boston Housing data set (Batch) • Time Series Prediction using OGP • Optical Beam Position Optimization • Conclusion

  19. LS SVM on Boston Housing Data Set RBF Kernel C=10, s=4 304 training samples averaged on 10 Random draws Average cpu = 3 sec / run

  20. OGP on Boston Housing Data Set • Kernel : • Initial Hyper-parameters: and (i=1 to 13 for BH) • Number of Hyper-parameter optimization iterations: tried between 3 and 6 • Max number of Support Vectors: Variable

  21. OGP on Boston Housing Data Set 6 Iterations, MaxBV between 10 and 250

  22. OGP on Boston Housing Data Set 3 Iterations, MaxBV between 10 and 150

  23. OGP on Boston Housing Data Set 4 Iterations, MaxBV between 10 and 150

  24. OGP on Boston Housing Data Set 6 Iterations, MaxBV between 10 and 150

  25. OGP on Boston Housing Data Set Cpu Time a*(b+SVs2)/SVs as a function of SVs

  26. OGP on Boston Housing Data Set Run with 4 Iterations, MaxBV between 10 and 60

  27. OGP on Boston Housing Data Set Final Run with 4 Iterations, MaxBV 30 and 40 Average over 50 random draws

  28. OGP on Boston Housing Data Set Conclusion MSE not as good as LS SVM (6.9 versus to 6.5) But Standard deviation better than LS SVM (1.1 versus 1.6). Cpu Time much longer (90sec versus 3 sec per run) But increases slower with number of samples than LS SVM. Might do better with large data sets.

  29. OGP on TSP TSP (Time Series Prediction)

  30. OGP on TSP: Initial Runs

  31. OGP on TSP: Initial Runs

  32. OGP on TSP: Initial Runs

  33. OGP on TSP: Local Minimum For both runs: Initial kpar = 1e-2 Final kpar = 1.42e-2 MSE = 1132 Final kpar = 2.45e-3 MSE = 95

  34. OGP on TSP: Impact of Over-fitting

  35. OGP on TSP: Impact of Number of Samples on Prediction

  36. OGP on TSP: Impact of Number of Samples on Prediction cpu=6sec

  37. OGP on TSP: Impact of Number of Samples on Prediction cpu=16 sec

  38. OGP on TSP: Impact of Number of Samples on Prediction cpu=27sec

  39. OGP on TSP: Impact of Number of Samples on Prediction cpu=45sec

  40. OGP on TSP: Impact of Number of Samples on Prediction cpu=109sec

  41. OGP on TSP: Impact of Number of Samples on Prediction cpu=233sec

  42. OGP on TSP: Impact of Number SVs on Prediction

  43. OGP on TSP: Impact of Number SVs on Prediction cpu=19sec

  44. OGP on TSP: Impact of Number SVs on Prediction cpu=16sec

  45. OGP on TSP: Impact of Number SVs on Prediction cpu=16sec

  46. OGP on TSP: Final Runs

  47. OGP on TSP: Final Runs Running 200 Sample at a time, with 30 sample overlap

  48. OGP on TSP: Why an overlap?

  49. OGP on TSP: Final Runs Does not always behaves!...

  50. OGP on TSP: Conclusion • Difficult to find the right set of parameters • Initial Kernel Parameter • Number of Support Vectors • Number of Training Samples per run

More Related