outline

Best Practices for Convolutional NNsApplied to Visual Document Analysis(according to P.A.Simard, D. Steinkraus, and J.C. Platt)

outline • the task • training set expansion • network architecture • learning

the task • handwriting recognition • segmented handwritten digits • data: • benchmark set of English digit images (MNIST) • size-normalized to 28 x 28 pixels • 60,000 training patterns, 10,000 test patterns • goal: image vector  {0, 1, …, 9}

the task • example from test set:

training set expansion • Etest – Etrain 1/P (P – size of training set) • idea: apply transformations to generate additional data • learning algorithm will learn transformation invariance (wrt. original, non-transformed, input)

training set expansion • examples of transformations: • translation • rotation • skewing • method: for every pixel in original image, computenew location, e.g. x(x,y)=1, y(x,y)=0 x(x,y)=a*x, y(x,y)=a*y (+ interpolation if a not int) • elastic deformations

training set expansion A (0,0) 3 (1,0) 7 (2,0) (0.75, 0.5)  5 (1,-1) 9 (2,-1) (0,0) xnew(x,y) = 1.75 ynew(x,y) = - 0.5 gray level (gl): evaluate gl at (xnew,ynew) with bilinear interpolation: over x: 3 + 0.75 * (7 - 3) = 6 5 + 0.75 * (9 - 5) = 8 over y: 8 + 0.5 * (6 - 8) = 7

training set expansion • elastic deformations • x(x,y) = rand(-1, +1), y(x,y) = rand(-1, +1) • smooth with Gaussian function of given SD (in pix) • if chosen SD large, resulting values small • if SD small, random field • intermediate SD: elastic deformation • factor for intensity

training set expansion • examples of distortions:

network architecture • account for topological properties of input (shape of curves, edges, etc.) • gradually extract more complex features • simple features extracted at higher resolutions, more coarser features at coarser resolutions over smaller regions • conversion from one to the other with operation of convolution • coarser resolutions generated by sub-sampling

network architecture

network architecture • set of layers each with one or more planes • each unit on plane receives input from small area on planes in previous layer local receptive fields • shared weights at all points on a plane  reduce number of parameters • multiple planes in each layer  detect multiple features • once feature detected, spatial subsampling local averaging of weights • (partial) invariance to translation, rotation, scale, and deformation

network architecture S2 S1 (factor of 2) C2 C1 Kernel size: 5x5 100 hidden units 50 features 5 features; e.g.: edge, ink, intersection

gradient-based learning • backpropagation • output: Yp = F(Xp, W) • loss function: Ep = D(Dp, F(Xp, W) • Etrain(W): average of Ep over training {(X1,D1), … {(Xp,Dp)} • Ep = (Dp - F(Xp, W))2 / 2 • Etrain(W) = 1/P * sum(Ep) • simplest setting: find W such that min Etrain(W)

gradient-based learning • if E differentiable wrt. W, • gradient-based optimization can be used to compute min • module output: Xn= Fn (Xn-1, Wn) • Wn: trainable parameters; Wn  W • Xn-1: module’s input (previous module’s output) • X0: input pattern Xp

gradient-based learning  Ep  Ep  Ep  Ep  Fn  Ep  Fk  Fn  Ep  Fn  Fn  Ep  Xn  xi  Xn-1  Xn-1  W  Wn  Xn  X  W  X  Wn  Xn = Jki • if known, then and can be computed: = (Wn, Xn-1) (Wn, Xn-1) : J[F(wn,xn)] wrt. W evaluated at (Wn,Xn-1) – compute gradient : J[F(wn,xn)] wrt. X at (Wn,Xn-1) – propagate backward : J[F]: martix containing partial derivatives of all outputs wrt. all inputs

gradient-based learning  Fn  W W(t) = W(t-1) -  • simplest minimization: gradient descent • W iteratively adjusted as follows: • traditional backprop: special case of gradient learning with: Yn = Wn Xn-1 Xn = F(Yn)

application • zip-code scanning (generalized version over time-domain) • fax reading • similar techniques used in other digital image recognition(e.g. face recognition, X-ray, MRI, etc.) • later version (2003): dynamically changing layer parameters

outline

outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

OUTLINE