220 likes | 362 Views
CS B553: Algorithms for Optimization and Learning. Gradient descent. Key Concepts. Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate descent Random restarts.
E N D
CS B553: Algorithms for Optimization and Learning Gradient descent
Key Concepts • Gradient descent • Line search • Convergence rates depend on scaling • Variants: discrete analogues, coordinate descent • Random restarts
Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase
Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase
Line search: pick step size to lead to decrease in function value
f(x-af(x)) a* a Line search: pick step size to lead to decrease in function value (Use your favorite univariate optimization method)
Gradient Descent Pseudocode • Input: f, starting value x1, termination tolerances • For t=1,2,…,maxIters: • Compute the search direction dt = -f(xt) • If ||dt||< εg then: return “Converged to critical point”, output xt • Find t so that f(xt+tdt) < f(xt) using line search • If ||tdt||< εx then: return “Converged in x”, output xt • Let xt+1 = xt+tdt • Return “Max number of iterations reached”, output xmaxIters
Related Methods • Steepest descent (discrete) • Coordinate descent