O. Teytaud , S. Gelly , S. Lallich , E. Prudhomme *Equipe I&A-TAO,

Quasi-random resampling O. Teytaud *, S. Gelly *, S. Lallich **, E. Prudhomme ** *Equipe I&A-TAO, LRI, Université Paris-Sud, Inria, UMR-Cnrs 8623 **Equipe ERIC, Université Lyon 2 Email : teytaud@lri.fr, gelly@lri.fr, stephane.lallich@eric.univ-lyon2.fr, eprudhomme@eric.univ-lyon2.fr

What is the problem ? Many tasks in AI are based on random resamplings : • cross-validation • bagging • bootstrap • ... Resampling is time-consuming • cross-validation for choosing hyper-parameters • bagging in huge datasets ==> we want to have with n resamplings the same result than with N>>n resamplings

A typical example You want to learn a relation x--> y on a huge dataset. The dataset is too large for your favorite learner. A traditional solution is subagging : average 100 learnings performed on random subsamples (1/20) of your dataset We propose : use QR-sampling to average only 40 learnings.

Organization of the talk (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

Why resampling is Monte-Carlo integration What is Monte-Carlo integration : E f(x) ⋲ sum f(x(i)) / n What is cross-validation: Error-rate ⋲ E f(x) ⋲ sum f(x(i)) / n where f(x) = error rate with the partitionning x

An introduction to QR-numbers (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

QR-numbers (2) quasi-random numbers (less randomized numbers) We have seen that resampling is Monte-Carlo integration, now we will see how Monte-Carlo integration has been strongly improved.

Quasi-random numbers ? Random samples in [0,1]^d can be not-so-well distributed --> error in Monte-Carlo integration O(1/n) with n the number of points Pseudo-random samples ⋲ random samples (we try to be very close to pure random) Quasi-random samples O(1/n) within logarithmic factors --> we don't try to be as close as possible to random --> number of samples much smaller for a given precision

Discrepancy = Max |Area – Frequency | Quasi-random = low discrepancy ?

Discrepancy2 = mean ( |Area – Frequency |2 ) A better discrepancy ?

Random --> Discrepancy ~ sqrt ( 1/n ) Quasi-random --> Discrepancy ~ log(n)^d/n Koksma & Hlawka : error in Monte-Carlo integration < Discrepancy x V V= total variation (Hardy & Krause) ( many generalizations in Hickernel, A Generalized Discrepancy and Quadrature Error Bound, 1997 ) Existing bounds on low-discrepancy-Monte-Carlo

Which set do you trust ?

Which quasi-random numbers ? « Halton-sequence with a simple scrambling scheme » • fast (as fast as pseudo-random numbers) ; • easy to implement ; • available freely if you don't want to implement it. (we will not detail how this sequence is built here) (also: Sobol sequence)

What else than Monte-Carlo integration ? Thanks to various forms of quasi-random : • Numerical integration [thousands of papers; Niederreiter 92] • Learning [Cervellera et al, IEEETNN 2004, Mary phD 2005] • Optimization [Teytaud et al, EA'2005] • Modelizat° of random-process [Growe-Kruska et al, BPTP'03, Levy's method] • Path planning [Tuffin]

... and how to do in strange spaces ? (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

Have fun with QR in strange spaces (3) quasi-random numbers in strange spaces We have seen that resampling is Monte-Carlo integration, and how Monte-Carlo is replaced by Quasi-Random Monte-Carlo. But resampling is random in a non-standard space. We will see how to do Quasi-Random Monte-Carlo in non-standard spaces.

Quasi-random numbers in strange spaces We have seen hypercubes :

... but we need something else ! Sample of points ---> QR sample of points Sample of samples ---> QR sample of samples

Quasi-random points in strange spaces Fortunately, some QR-points exist also in various spaces.

Why not in something isotropic ? How to do it in the sphere ? Or for gaussian distributions ?

For the gaussian : easy ! Generate x in [0,1]^d by quasi-random Build y: P( N < y(i) ) = x(i) It works because distrib = product of distrib(y(i)) What in the general case ?

Ok ! - generate x in [0,1]^d - define y(i) such that P(t<y(i) | y(1), y(2), ..., y(i-1))=x(i) Ok !

However, we will do that • We do not have better than this general method for the strange distributions in which we are interested • At least we can prove the O(1/n) property (see the paper) • Perhaps there is much better • Perhaps there is much simpler

The QR-numbers in resampling (4) applying quasi-random numbers in resampling we have seen that resampling is Monte-Carlo integration, that we were able of generating quasi-random points for any distribution in continuous domains; ==> it should work ==> let's see in details how to move the problem to the continuous domain

QR-numbers in resampling A very particular distribution for QR-points : bootstrap samples. How to move the problem to continuous spaces ? y(i) = x(r(i)) where r(i) = randomly uniformly distributed in [[1,n]] ==> this is discrete

QR-numbers in resampling A very particular distribution for QR-points : bootstrap samples. How to move the problem to continuous spaces ? We know : We need : y(i) = x(r(i)) where r(i) = randomly uniformly distributed in [[1,n]] --> many solutions exist Rectangular uniform distribution Any continuous distribution Continuous distribution Our discrete distribution

What are bootstrap samples ? Our technique works for various forms of resamplings : - subsamples without replacement (random-CV, subagging) - subsamples with replacement (bagging, bootstrap) - random partitionning (k-CV). W.l.o.g., we present here the sampling of n elements in a sample of size n with replacement (= bootstrap resampling). (usefull in e.g. Bagging, bias/variance estimation...)

A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr  [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7

A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr  [0,1]^n QR in dimension n with n the number of examples. 0.1 0.9 0.84 0.9 0.7 ==> (1, 0,0,1,3)

A naive solution y(i) = x(r(i)) r(1),...,r(n) = ceil( n x qr ) where qr  [0,1]^n QR in dimension n with n the number of examples. ==> all permutations of ( 0.1, 0.9, 0.84, 0.9, 0.7) lead to the same result !

...which does not work. In practice it does not work better than random. Two very distinct QR-points can lead to very similar resamples (permutation of a point lead to the same sample). We have to remove this symetry.

A less naive solution z(i) = number of times x(i) appears in the bootstrap sample z(1) = binomial z(2) | z(1) = binomial z(3) | z(1), z(2) = binomial ... z(n-1) | z(1), z(2),...,z(n-2) = binomial z(n) | z(1), z(2), ..., z(n) = constant ==> yes, it works ! ==> moreover, it works for many forms of resamplings and not only bootstrap !

With dimension-reduction it's better Put x(i)'s in k clusters z(i) = number of times an element of cluster i appears in the bootstrap sample z(1) = binomial z(2) | z(1) = binomial z(3) | z(1), z(2) = binomial ... z(k-1) | z(1), z(2),...,z(k-2) = binomial z(k) | z(1), z(2), ..., z(k) = constant (then, randomly draw the elements in each cluster)

Let's summarize Put x(i)'s in k clusters z(i) = number of times an element of cluster i appears in the bootstrap sample z(1) = binomial z(2) | z(1) = binomial ... z(k) | z(1), z(2), ..., z(k) = constant we quasi-randomize this z(1),...,z(k) Then, we randomly draw the elements in each cluster.

Let's conclude (1) why resampling is Monte-Carlo integration (2) quasi-random numbers (3) quasi-random numbers in strange spaces (4) applying quasi-random numbers in resampling (5) when does it work and when doesn't it work ?

Experiments In our (artificial) experiments : • QR-randomCV is better than randomCV • QR-bagging is better than bagging • QR-subagging is better than subagging • QR-Bsfd is better than Bsfd (a bootstrap) But QR-kCV is not better than kCV kCV already has some derandomization: each point appears the same number of times in learning

A typical example You want to learn a relation x--> y on a huge ordered dataset. The dataset is too large for your favorite learner. A traditional solution is subagging : average 100 learnings performed on random subsets (1/20) of your dataset We propose : use QR-sampling to average only 40 learnings. Or do you have a better solution for choosing 40 subsets of 1/20 ?

Conclusions Therefore: • perhaps simpler derandomizations are enough ? • perhaps in cases like CV in which « symetrizing » (picking each example the same number of times) is easy, this is useless ? For bagging, subagging, bootstrap, simplifying the approach is not so simple ==> now, we use QR-bagging, QR-subagging and QR-bootstrap instead of bagging, subbagging and bootstrap

Further work Real-world experiments (in progress, for DP-applications) Other dimension reduction (this one involves clustering) Simplified derandomization methods (jittering, antithetic variables, ...) Random clustering for dimension reduction ? (yes, we have not tested, sorry ...)

Random --> Discrepancy ~ sqrt ( 1/n ) Quasi-random --> Discrepancy ~ log(n)^d/n Koksma & Hlawka : error in Monte-Carlo integration < Discrepancy x V V= total variation (Hardy & Krause) ( many generalizations in Hickernel, A Generalized Discrepancy and Quadrature Error Bound, 1997 ) Low Discrepancy ?

Dimension 1 • What would you do ?

Dimension 1 • What would you do ? • --> Van Der Corput • n=1, n=2, n=3... • n=1, n=10, n=11, n=100, n=101, n=110... • x=.1, x=.01, x=.11, x=.001, x=.101, ...

O. Teytaud , S. Gelly , S. Lallich , E. Prudhomme *Equipe I&A-TAO,