430 likes | 725 Views
Combining Shape and Physical Models for Online Cursive Handwriting Synthesis. Jue Wang (University of Washington) Chenyu Wu (Carnegie Mellon University) Ying-Qing Xu (Microsoft Research Asia) Heung-Yeung Shum (Microsoft Research Asia).
E N D
Combining Shape and Physical Models for Online Cursive Handwriting Synthesis Jue Wang (University of Washington)Chenyu Wu (Carnegie Mellon University)Ying-Qing Xu (Microsoft Research Asia)Heung-Yeung Shum (Microsoft Research Asia) International Journal on Document Analysis and Recognition (IJDAR) 2004
Introduction • Handwriting computing techniques (pen-based devices) • Handwriting recognition • make it possible for computers to understand the information involved in handwriting • Handwriting modulation • handwriting editing, error correction, script searching
Introduction • Handwriting Modeling & Synthesis • Movement-simulation techniques • base on motor models and try to model the process of handwriting production • focus on the representation and analysis of real handwriting signals rather than handwriting synthesis
Introduction • Shape-simulation methods • consider the static shape of handwriting trajectory • more practical than movement-simulation tech when dynamic information is not available • straight forward approach : synthesize form collected handwritten glyphs • learning-based cursive handwriting synthesis approach
Introduction • Successful handwriting synthesis algorithm • shapes of letters vs. training samples • connection between synthesized letters • A novel cursive handwriting synthesis tech • Combine the advantages of the shape-simulation and the movement-simulation methods
Outline • Sample collection and segmentation • Learning strategies • Synthesis Strategies • Experimental results • Discussion and Conclusion
Sample Collection • About 200 words • Each letter has appeared more than 5 times • These handwriting samples firstly pass through a low pass filter and then be re-sampled to produce equidistant points
Sample Segmentation • Overview • Segmentation-based recognition method • Recognition-based segmentation (rely heavily on the performance of the recognition engine) • Level-building • simultaneously outputs the recognition and segmentation results • segmentation and recognition are merged to give an optimal result
A Two-level Framework • Framework of traditional handwriting segmentation approaches • Temporal handwriting sequence • is a low level feature that denotes the coordinate and velocity of the sequence at time t
Segmentation • The segmentation problem is to find the identity string {I1,…,In}, with the corresponding segments of the sequence {S1,…,Sn}, S1= {z1,…,zt1},…, Sn={ztn-1,…, zT},that best explain the sequence
Segmentation • For the training of the writer-independent segmentation system • low-level feature-based segmentation algorithm works well for a small number of writers • A script code is calculated from handwriting data as the middle-level feature
Middle Level Feature • Five kinds of key points are extracted • points of maximum/minimum x-coordinate (X+,X-) • points of maximum/minimum y-coordinate (Y+,Y-) • crossing points ( ) • Average direction of the interval sequence between two adjacent key points
Middle Level Feature • Samples of each character are divided into several clusters • those in the same cluster have a similar structural topology • Since the length of script code might not be the same in all cases→ can’t directly compute the similarity • The script code is modeled as a homogeneous Markov chain
Middle Level Feature • Given two script codes T1, T2 • We may compute the stationary distributions , and transition matrix A1, A2 • The similarity between two script codes is measured as
Middle Level Feature • The position of , , A1, A2 are enforced symmetrically • balance the variance of the KL divergence and the difference in code length • If both the stationary distribution and the transition matrix of two script codes are matched well, and their code lengths are almost the same → d(T1, T2) is close to 1
Segmentation • After introducing the script code as middle-level features, the optimization problem becomes • improve the accuracy of segmentation • dramatically reduce the computational complexity of level-building
Outline • Sample collection and segmentation • Learning strategies • Synthesis Strategies • Experimental results • Discussion and Conclusion
Learning Strategies • Data alignment • Trajectory matching • Training set alignment • Shape models
Trajectory Matching • Segmentation and reconstruction of on-line handwritten scripts (1998, Pattern Recognition) Each piece is simple arc, points can be equidistantly sampled from it to represent the stroke
Trajectory Matching • Landmark-point-extraction method • pen-down, pen-up points • local extrema of curvature • inflection points of curvature • A handwriting sample can be divided into as many as six pieces • The same character are mostly composed of the same number of pieces and they match each other naturally
Trajectory Matching • A handwriting sample can be represented by a point vector • s: number of static pieces segmented from the sample • ni: number of points extracted from the i th piece
Trajectory Matching • The following is to align different vector into a common coordinate frame • estimate an affine transform for each samplethat transforms the sample into the coordinate frame • Affine transformations: translation, rotation, scaling
Training Set Alignment • Iterative algorithm(Learning from one example through shared densities on transforms (IEEE CVPR 2000) ) • Deformable energy based criterion is defined as
Training Set Alignment - Algorithm • Maintain an affine transform matrix Ui for each sample, which is set to identity initially • Compute the deformable energy-based criterion E • Repeat until convergence: • For each one of the six unit affine matrixes[14], Aj, j = 1,…,6 • Let • Apply to the sample and recalculate the criterion E • If E has been reduced, accept , otherwise: • Let and apply again,If E has been reduce, accept , otherwise revert to Ui • End
Shape Models • By modeling the distribution of aligned vectors, new examples can be generated that are similar to those in the training set • Like the Active Shape Model, principal component analysis is applied to the data (PCA)(Statistical models of appearance for computer vision, Draft report, 2000)
Shape Model • Formally, the covariance of the data is calculated as • Then the eigenvectors and corresponding eigenvalues of S are computed and sorted so that • The training set is approximated by • represent the t eigenvectors corresponding to the largest eigenvalues • b is a vt-dimensional vector given by • By varying the elements in b, new handwriting trajectory can be generated from this model • apply limits of to the elements bi
Outline • Sample collection and segmentation • Learning strategies • Synthesis Strategies • Experimental results • Discussion and Conclusion
Synthesis Strategies • Generate each individual letter in the word • Then the baselines of these letters are aligned and juxtaposed in a sequence • Concatenate letters with their neighbors to form a cursive handwriting • →can’t be easily achieved • To solve this problem, a delta log-normal model based conditional sampling algorithm is proposed
t0: activation timeDi: amplitude of impulse commands : mean time delay :response time of the agonist and antagonist system Delta Log-normal Model • A powerful tool in analyzing rapid human movements • With respect to handwriting generation, the movement of a simple stroke is controlled by velocity • The magnitude of the velocity is described as(Why handwriting segmentation can be misleading?, 13th international conference on PR, 1996) log-normal function (on a logarithmic scale axis)
Delta Log-normal Model • The angular velocity can be expressed as • The angular velocity is calculated as the derivative of • Give , the curvature along a stroke piece is calculated as • The static shape of the piece is an arc, characterized by : initial directionc0: constant (arc length)
Delta Log-normal Model-Example [Why Handwriting Segmentation Can Be Misleading, 1996 IEEE ICPR]
Conditional Sampling • First, the trajectories of synthesized handwriting letters are decomposed into static pieces • The first piece of a trajectory is called head piece, and the last piece is called the tail piece • In the concatenation process, the trajectories of letters will be deformed to produce a natural cursive handwriting,by changing the parameters of the head and the tail pieces from
Conditional Sampling • A deformation energy of a stroke is defined as • A concatenation energy between the i th letter and the (i+1) th letter is defined as • By minimizing the second and the third items, the two letters are forced to connect with each other smoothly and naturally
Conditional Sampling • The concatenation energy of a whole word is calculated as • We must ensure that the deformed letters are consistent with models • The sampling energy is calculated as • The whole energy formulation is finally given as
Synthesis-Iterative Approach • Randomly generate a vector b(i) for each letter initially • Generate trajectories Si of letters and calculate an affine transform Ti for each letter (transform it to its desired position) • For each pair of adjacent letters {Si, Si+1}, deform the pieces in these letters to minimize the concatenation energy Ec(i, i+1) • Project the deformed shape into the model coordinate frame • Update the model parameters • If not converged return to step 2
Discussion & Conclusion • Performance is limited by samples used for training since the shape models can only generate novel shapes within the variation of training samples • Although some experimental results are shown, it is still not known how to make an objective evaluation on the synthesized scripts and compare different synthesis approaches
Markov chain on a space X with transitions T is a random process (infinite sequence of random variables) (x(0), x(1),…x(t),…) that satisfy That is, the probability of being in a particular state at time t given the state history depends only on the state at time t-1 If the transition probabilities are fixed for all t, the chain is considered homogeneous 0.4 0.7 0.3 0 x2 0.3 T= 0.3 0.4 0.3 0.3 0 0.3 0.7 0.3 0.3 0.7 0.7 x1 x3 Markov chains
0.4 x2 0.3 0.3 0.3 0.3 0.7 0.7 x1 x3 0.7 0.3 0 x = 0.33 0.33 0.33 0.33 0.33 0.33 0.3 0.4 0.3 0.7 0.3 0 0 0.3 0.7 T= 0.3 0.4 0.3 0 0.3 0.7 Stationary distribution • Consider the Markov chain given above: • The stationary distribution is