130 likes | 233 Views
Hand-written character recognition. Failure rate for test samples. MNIST: a data set of hand-written digits 60,000 training samples 10,000 test samples Each sample consists of 28 x 28 = 784 pixels Various techniques have been tried Linear classifier: 12.0%
E N D
Hand-written character recognition Failure rate for test samples MNIST: a data set of hand-written digits 60,000 training samples 10,000 test samples Each sample consists of 28 x 28 = 784 pixels Various techniques have been tried Linear classifier: 12.0% 2-layer BP net (300 hidden nodes) 4.7% 3-layer BP net (300+200 hidden nodes) 3.05% Support vector machine (SVM) 1.4% Convolutional net 0.4% 6 layer BP net (7500 hidden nodes): 0.35%
Hand-written character recognition Our own experiment: BP learning with 784-300-10 architecture Total # of weights: 784*300+300*10 = 238,200 Total # of Δw computed for each epoch: 1.4*10^10 Ran 1 month before it stopped Test error rate: 5.0%
Risk-Averting Error Function • Mean Squared Error (MSE) • Risk-Averting Error (RAE) James Ting-Ho Lo. Convexification for data fitting. Journal of Global Optimization, 46(2):307–315, February 2010.
Normalized Risk-Averting Error • Normalized Risk-Averting Error (NRAE) It can be simplified as
The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method • A quasi-Newton method for solving the nonlinear optimization problems • Using first-order gradient information to generate an approximation to the Hessian (second-order gradient) matrix • Avoiding the calculation of the exact Hessian matrix can significantly save the computational cost during the optimization
The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method The BFGS Algorithm: • Generate an initial guess and an initial approximate inverse Hessian Matrix . • Obtain a search direction at step k by solving: where is the gradient of the objective function evaluated at . • Perform a line search to find an acceptable stepsize in the direction , then update
The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method • Set and . • Update the approximate Hessian matrix by • Repeat step 2-5 until converges to the solution. Convergence can be checked by observing the norm of the gradient, .
The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method Limited-memory BFGS Method: • A variation of the BFGS method • Only using a few vectors to represent the approximation of the Hessian matrix implicitly • Less memory requirement • Well suited for optimization problems with a large number of variables
References • J. T. Lo and D. Bassu. An adaptive method of training multilayer perceptrons. In Proceedings of the 2001 International Joint Conference on Neural Networks, volume 3, pages 2013–2018, July 2001. • James Ting-Ho Lo. Convexification for data fitting. Journal of Global Optimization, 46(2):307–315, February 2010. • BFGS: http://en.wikipedia.org/wiki/BFGS