420 likes | 552 Views
Optimization in R: algorithms, sequencing, and automatic differentiation. James Thorson Aug. 26, 2011. Themes. Basic: Algorithms Settings Starting location Intermediate Sequenced optimization Phasing Parameterization Standard errors Advanced Derivatives. Outline.
E N D
Optimization in R: algorithms, sequencing, and automatic differentiation James Thorson Aug. 26, 2011
Themes Basic: Algorithms Settings Starting location Intermediate Sequenced optimization Phasing Parameterization Standard errors Advanced Derivatives
Outline • One-dimensional • Two-dimensional • Using derivatives
Basic: Algorithm • Characterists • Very fast • Somewhat unstable • Process • Starts with 2 points • Moves in direction of higher point • Then goes between two highest points optimize(fn =, interval =, ...)
Intermediate: Sequenced Sequencing: • Using a stable but slow method • Then using a fast method for fine-tuning One-dimensional sequencing • Grid-search • Then use optimize()
Basic: Algorithms Other one-dimensional functions • uniroot – Finds where f(∙) = 0 • polyroot – Finds all solutions to f(∙) = 0
Basic: Settings • trace = 1 • Means different things for different optimization routines • In general, gives output during optimization • Useful for diagnostics optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))
Basic: Settings optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))
Basic: Settings • follow.on = TRUE • Starts subsequent methods at last stopping point • method = c(“nlminb”,”L-BFGS-U”) • Lists the set and order of methods to use optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”)) calcMin() in “PBSmodelling” package
Basic: Settings Contraints • Unbounded • Bounded • I recommend using bounds • Box-constraints are common • Non-box constraints • Usually implemented in the objective function
Basic: Algorithms Differences among algorithms: • Speed vs. accuracy • Unbounded vs. bounded • Can use derivatives
Basic: Algorithms Nelder-Mead (a.k.a. “Simplex”) • Characteristics • Bounded (nlminb) • Unbounded (optimx) • Cannot use derivatives • Slow and but good at following valleys • Easily stuck at local minima
Basic: Algorithms Nelder-Mead (a.k.a. “Simplex”) • Process • Uses a polygon with n+1 vertices • Take worst point and rotate across center • If worse: shrink • If better: Accept and expand along axis
Basic: Algorithms Rosenbrock “Banana” Function
Basic: Algorithms Quasi-Newton (“BFGS”) • Characteristics • Bounded (optim, method=“BFGS”) • Unbounded (optim, method=“L-BFGS-U”) • Can use derivatives • Fast and less accurate
Basic: Algorithms Quasi-Newton (“BFGS”) • Process • Approximates gradient and Hessian • Uses Newton’s method to update location • Uses various other methods to update gradient and Hessian
Basic: Algorithms Quasi-Newton (“ucminf”) • Different variation on quasi-Newton
Basic: Algorithms Conjugate gradient • Characteristics: • Bounded (optim) • Very fast for near-quadratic problems • Low memory • Highly unstable generally • I don’t recommend it for general usage
Basic: Algorithms Conjugate gradient • Process • Numerical calculation of derivatives • Subsequent derivatives are “conjugate” (i.e. form an optimal linear basis for a quadratic problem)
Basic: Algorithms Many others! As one example…. Spectral project gradient • Characterististics • ??? • Process • ???
Basic: Algorithms Accuracy trials
Basic: starting location It’s important to provide a good starting location! • Some methods (like nlminb) find the nearest local minimum • Speeds convergence
Intermediate: Parameterization Suggestions: • All parameters on a similar scale • Derivatives are approximately equal • One method: use exp() and plogit() for inputs • Minimize covariance • Minimize changes in scale or covariance
Intermediate: Phasing Phasing • Estimate some parameters (with others fixed) in a first phase • Estimate more parameters in each phase • Eventually estimate all parameters Uses • Multi-species models • Estimate with linkages in later phases • Statistical catch-at-age • Estimate scale early
Intermediate: Standard errors Maximum likelihood allows asymptotic estimates of standard errors • Calculate Hessian matrix at maximum likelihood estimate • Second derivatives of Log-Likelihood function • Invert the Hessian • Diagonal entries are variances • Square root is standard error
Intermediate: Standard errors Calculation of Hessian depends on parameter transformations • When using exp() or logit() transformations, use the delta-method to transform back to normal space
Intermediate: Standard errors Gill and King (2004) “What to do when your Hessian is not invertible” gchol() – Generalized Cholesky (“kinship”) ginv() – Moore-Penrose Inverse (“MASS”)
Intermediate: Standard errors [ Switch over to R-screen to show mle() and solve(hess()) ]
Advanced: Differentiation Gradient: • Quasi-newton • Conjugate gradient Hessian: • Quasi-newton optimx(par = , fn = , gr=, hess=, lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))
Advanced: Differentiation Automatic differentiation • AD Model Builder • “radx” package (still in development) Semi-Automatic differentiation • “Rsympy” package Symbolic differentiation • “deriv” BUT: None of these handle loops or “sum/prod” so they’re not really helpful for statistics yet
Advanced: Differentiation Mixture distribution model (~ 15 params) • 10 seconds in R • 2 seconds in ADMB Multispecies catchability model (~ 150 params) • 4 hours in R (using trapezoid method) • 5 minutes in ADMB (using MCMC) Surplus production meta-analysis (~ 750 coefs) • 7 days in R (using trapezoid method) • 2 hours in ADMB (using trapezoid method)