1 / 58

O. Teytaud *, S. Gelly *, S. Lallich **, E. Prudhomme ** *Equipe I&A-TAO,

Quasi-random resampling. O. Teytaud *, S. Gelly *, S. Lallich **, E. Prudhomme ** *Equipe I&A-TAO, LRI, Université Paris-Sud, Inria, UMR-Cnrs 8623 **Equipe ERIC, Université Lyon 2 Email : teytaud@lri.fr , gelly@lri.fr ,

carlow
Download Presentation

O. Teytaud *, S. Gelly *, S. Lallich **, E. Prudhomme ** *Equipe I&A-TAO,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quasi-random resampling O. Teytaud *, S. Gelly *, S. Lallich **, E. Prudhomme ** *Equipe I&A-TAO, LRI, Université Paris-Sud, Inria, UMR-Cnrs 8623 **Equipe ERIC, Université Lyon 2 Email : teytaud@lri.fr, gelly@lri.fr, stephane.lallich@eric.univ-lyon2.fr, eprudhomme@eric.univ-lyon2.fr

  2. What is the problem ? Many tasks in AI are based on random resamplings : • cross-validation • bagging • bootstrap • ... Resampling is time-consuming • cross-validation for choosing hyper-parameters • bagging in huge datasets ==> we want to have with n resamplings the same result than with N>>n resamplings

  3. A typical example You want to learn a relation x--> y on a huge dataset. The dataset is too large for your favorite learner. A traditional solution is subagging : average 100 learnings performed on random subsamples (1/20) of your dataset We propose : use QR-sampling to average only 40 learnings.

  4. Organization of the talk (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

  5. Why resampling is Monte-Carlo integration What is Monte-Carlo integration : E f(x) ⋲ sum f(x(i)) / n What is cross-validation: Error-rate ⋲ E f(x) ⋲ sum f(x(i)) / n where f(x) = error rate with the partitionning x

  6. An introduction to QR-numbers (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

  7. QR-numbers (2) quasi-random numbers (less randomized numbers) We have seen that resampling is Monte-Carlo integration, now we will see how Monte-Carlo integration has been strongly improved.

  8. Quasi-random numbers ? Random samples in [0,1]^d can be not-so-well distributed --> error in Monte-Carlo integration O(1/n) with n the number of points Pseudo-random samples ⋲ random samples (we try to be very close to pure random) Quasi-random samples O(1/n) within logarithmic factors --> we don't try to be as close as possible to random --> number of samples much smaller for a given precision

  9. Discrepancy = Max |Area – Frequency | Quasi-random = low discrepancy ?

  10. Discrepancy2 = mean ( |Area – Frequency |2 ) A better discrepancy ?

  11. Random --> Discrepancy ~ sqrt ( 1/n ) Quasi-random --> Discrepancy ~ log(n)^d/n Koksma & Hlawka : error in Monte-Carlo integration < Discrepancy x V V= total variation (Hardy & Krause) ( many generalizations in Hickernel, A Generalized Discrepancy and Quadrature Error Bound, 1997 ) Existing bounds on low-discrepancy-Monte-Carlo

  12. Which set do you trust ?

  13. Which quasi-random numbers ? « Halton-sequence with a simple scrambling scheme » • fast (as fast as pseudo-random numbers) ; • easy to implement ; • available freely if you don't want to implement it. (we will not detail how this sequence is built here) (also: Sobol sequence)

  14. What else than Monte-Carlo integration ? Thanks to various forms of quasi-random : • Numerical integration [thousands of papers; Niederreiter 92] • Learning [Cervellera et al, IEEETNN 2004, Mary phD 2005] • Optimization [Teytaud et al, EA'2005] • Modelizat° of random-process [Growe-Kruska et al, BPTP'03, Levy's method] • Path planning [Tuffin]

  15. ... and how to do in strange spaces ? (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

  16. Have fun with QR in strange spaces (3) quasi-random numbers in strange spaces We have seen that resampling is Monte-Carlo integration, and how Monte-Carlo is replaced by Quasi-Random Monte-Carlo. But resampling is random in a non-standard space. We will see how to do Quasi-Random Monte-Carlo in non-standard spaces.

  17. Quasi-random numbers in strange spaces We have seen hypercubes :

  18. ... but we need something else ! Sample of points ---> QR sample of points Sample of samples ---> QR sample of samples

  19. Quasi-random points in strange spaces Fortunately, some QR-points exist also in various spaces.

  20. Why not in something isotropic ? How to do it in the sphere ? Or for gaussian distributions ?

  21. For the gaussian : easy ! Generate x in [0,1]^d by quasi-random Build y: P( N < y(i) ) = x(i) It works because distrib = product of distrib(y(i)) What in the general case ?

  22. Ok ! - generate x in [0,1]^d - define y(i) such that P(t<y(i) | y(1), y(2), ..., y(i-1))=x(i) Ok !

  23. However, we will do that • We do not have better than this general method for the strange distributions in which we are interested • At least we can prove the O(1/n) property (see the paper) • Perhaps there is much better • Perhaps there is much simpler

  24. The QR-numbers in resampling (4) applying quasi-random numbers in resampling we have seen that resampling is Monte-Carlo integration, that we were able of generating quasi-random points for any distribution in continuous domains; ==> it should work ==> let's see in details how to move the problem to the continuous domain

  25. QR-numbers in resampling A very particular distribution for QR-points : bootstrap samples. How to move the problem to continuous spaces ? y(i) = x(r(i)) where r(i) = randomly uniformly distributed in [[1,n]] ==> this is discrete

  26. QR-numbers in resampling A very particular distribution for QR-points : bootstrap samples. How to move the problem to continuous spaces ? We know : We need : y(i) = x(r(i)) where r(i) = randomly uniformly distributed in [[1,n]] --> many solutions exist Rectangular uniform distribution Any continuous distribution Continuous distribution Our discrete distribution

  27. What are bootstrap samples ? Our technique works for various forms of resamplings : - subsamples without replacement (random-CV, subagging) - subsamples with replacement (bagging, bootstrap) - random partitionning (k-CV). W.l.o.g., we present here the sampling of n elements in a sample of size n with replacement (= bootstrap resampling). (usefull in e.g. Bagging, bias/variance estimation...)

  28. A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr  [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7

  29. A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr  [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7

  30. A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr  [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7

  31. A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr  [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7

  32. A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr  [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7 ==> (1, 0,0,1,3)

  33. A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr  [0,1]^n QR in dimension n with n the number of examples. ==> all permutations of ( 0.1, 0.9, 0.84, 0.9, 0.7) lead to the same result !

  34. ...which does not work. In practice it does not work better than random. Two very distinct QR-points can lead to very similar resamples (permutation of a point lead to the same sample). We have to remove this symetry.

  35. A less naive solution z(i) = number of times x(i) appears in the bootstrap sample z(1) = binomial z(2) | z(1) = binomial z(3) | z(1), z(2) = binomial ... z(n-1) | z(1), z(2),...,z(n-2) = binomial z(n) | z(1), z(2), ..., z(n) = constant ==> yes, it works ! ==> moreover, it works for many forms of resamplings and not only bootstrap !

  36. With dimension-reduction it's better Put x(i)'s in k clusters z(i) = number of times an element of cluster i appears in the bootstrap sample z(1) = binomial z(2) | z(1) = binomial z(3) | z(1), z(2) = binomial ... z(k-1) | z(1), z(2),...,z(k-2) = binomial z(k) | z(1), z(2), ..., z(k) = constant (then, randomly draw the elements in each cluster)

  37. Let's summarize Put x(i)'s in k clusters z(i) = number of times an element of cluster i appears in the bootstrap sample z(1) = binomial z(2) | z(1) = binomial ... z(k) | z(1), z(2), ..., z(k) = constant we quasi-randomize this z(1),...,z(k) Then, we randomly draw the elements in each cluster.

  38. Let's conclude (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

  39. Experiments In our (artificial) experiments : • QR-randomCV is better than randomCV • QR-bagging is better than bagging • QR-subagging is better than subagging • QR-Bsfd is better than Bsfd (a bootstrap) But QR-kCV is not better than kCV kCV already has some derandomization: each point appears the same number of times in learning

  40. A typical example You want to learn a relation x--> y on a huge ordered dataset. The dataset is too large for your favorite learner. A traditional solution is subagging : average 100 learnings performed on random subsets (1/20) of your dataset We propose : use QR-sampling to average only 40 learnings. Or do you have a better solution for choosing 40 subsets of 1/20 ?

  41. Conclusions Therefore: • perhaps simpler derandomizations are enough ? • perhaps in cases like CV in which « symetrizing » (picking each example the same number of times) is easy, this is useless ? For bagging, subagging, bootstrap, simplifying the approach is not so simple ==> now, we use QR-bagging, QR-subagging and QR-bootstrap instead of bagging, subbagging and bootstrap

  42. Further work Real-world experiments (in progress, for DP-applications) Other dimension reduction (this one involves clustering) Simplified derandomization methods (jittering, antithetic variables, ...) Random clustering for dimension reduction ? (yes, we have not tested, sorry ...)

  43. Random --> Discrepancy ~ sqrt ( 1/n ) Quasi-random --> Discrepancy ~ log(n)^d/n Koksma & Hlawka : error in Monte-Carlo integration < Discrepancy x V V= total variation (Hardy & Krause) ( many generalizations in Hickernel, A Generalized Discrepancy and Quadrature Error Bound, 1997 ) Low Discrepancy ?

  44. Dimension 1 • What would you do ?

  45. Dimension 1 • What would you do ?

  46. Dimension 1 • What would you do ?

  47. Dimension 1 • What would you do ?

  48. Dimension 1 • What would you do ?

  49. Dimension 1 • What would you do ?

  50. Dimension 1 • What would you do ? • --> Van Der Corput • n=1, n=2, n=3... • n=1, n=10, n=11, n=100, n=101, n=110... • x=.1, x=.01, x=.11, x=.001, x=.101, ...

More Related