1 / 37

Word Spotting DTW

Word Spotting DTW. Word Spot DTW. Introduction The Basic Idea Pruning DTW Matching Words With DTW Experimental Results Summary. Introduction. Libraries contain an enormous amount of hand-written historical documents. They would like to make it available electronically.

maeko
Download Presentation

Word Spotting DTW

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Word Spotting DTW

  2. Word Spot DTW • Introduction • The Basic Idea • Pruning • DTW • Matching Words With DTW • Experimental Results • Summary

  3. Introduction • Libraries contain an enormous amount of hand-written historical documents. • They would like to make it available electronically. • such large collections can only be accessed efficiently if a searchable index exist. • The current state-of-the-art approach is to manually create an index.

  4. Introduction – cont. • The quality of historical documents is degraded due to faded ink, stained paper, etc. • Traditional Optical Character Recognition (OCR) techniques that usually recognize words character-by-character, fail.

  5. Introduction – cont.

  6. The Basic Idea • For handwritten manuscripts written by a single author - the images of multiple instances of the same word are likely to look similar. • Word spotting idea provides an alternative approach to index generation.

  7. Word Spotting • Each page in the document collection is segmented into words. • The different instances of a word are clustered together using image matching. • A human can tag the n most interesting clusters for indexing with the appropriate ASCII equivalent.

  8. Matching • Good matching performance can be achieved by: • A technique that skews, resizes and aligns two candidate words. • Compares the words pixel-by-pixel. • We will use DTW.

  9. Pruning • Running a matching algorithm is expensive with growing collection sizes. • Pruning techniques which can discard unlikely matches are used.

  10. Pruning Techniques • Pruning of word pairs based on the area and aspect ratio of their bounding boxes. • Require words to have the same number of descenders (strokes below the baseline). • The idea is to require similar pruning statistics.

  11. Ascenders Upper Baseline Lower Baseline Descenders

  12. DTW • Used to compute a distance between two time series. • A time series is a list of samples taken from a signal ordered by time. • Naive approach: resample one of them and then compare the series sample-by-sample. • does not produce intuitive results, as it compares samples that might not correspond well.

  13. DTW • Recovering optimal alignments between sample points in the two time series. • Demonstrates: time

  14. Comparison between Naive & DTW i i i+2 i i time time Any distance (Euclidean, Manhattan, …) which aligns the i-th point on one time series with the i-th point on the other will produce apoor similarity score. A non-linear (elastic) alignment produces amore intuitive similarity measure, allowing similar shapes to match even if they are out of phase in the time axis.

  15. DTW • The DTW-distance between two time series Xi . . . Xm and Yi . . . Yn is D(m,n). • D(i,j)= min {D(i,j-1),D(i-1,j),D(i-1,j-1)} + d(i,j) • d(i,j) varies with the application. • This calculation realizes a local continuity constraint.

  16. Warping Function Time Series A is 1 n m pk To find the best alignment between A and B one needs to find the path through the grid P = p1, … , ps , … , pk ps= (is,js) which minimizes the total distance between them. Pis called a warping function. js ps Time Series B p1 1

  17. Time-Normalized Distance Measure Time Series A Time-normalized distance between A and B: is 1 n m pk D(A ,B)= d(ps): distance betweenisandjs ws > 0:weighting coefficient. js ps Best alignment path between A and B: P0 = (D(A ,B)). Time Series B p1 1

  18. Matching words with DTW

  19. Matching words with DTW • The inter-character and intra-character spacing is subject to larger variations. • DTW offers a more flexible way compensate for these variations than linear scaling. • We first normalize the slant and skew angle of candidate images. • From each word, four features per image column are extracted and combined into a single time series.

  20. Matching Words With DTW • For each image I with height h and width w, we extract a time series: • X(I) = x1….xw. • xi = f1(I,i),f2(I,i),f3(I,i),f4(I,i). • fk = four extracted features per image column.

  21. Matching Words With DTW • In order to run the DTW algorithm on two time series X(I) and Y(J), we define a local distance function: • d(xi,yj ) = ∑ (fk(I,i)-fk(J,j))² • Now, the DTW algorithm can be run to determine a warping path between X and Y: • D(X,Y) = ∑ d(xik,yjk )

  22. DTW Features • Projection Profiles • Word Profiles • Upper word profiles • Lower word profiles • Background/Ink transitions

  23. Projection Profile • Projection profile capture the distribution of ink along one dimension in a word image. • A vertical projection profile is computed by summing the intensity values in each image column separately: • PP(I,c) = ∑(255-I(r,c)) h r=1

  24. (a) original image: slant/skew/baseline-normalized, cleaned. (b) normalized projection profile.

  25. Word Profiles • Word profiles capture part of the outlining shape of a word. • Using upper and lower word profiles. • Going along the upper (lower) boundary of a word’s bounding box. • Recording for each image column the distance to the nearest “ink” pixel in that column.

  26. Word Profiles • Due to a number of factors, some image columns may not contain ink pixels. • Therefore, these gaps are closed by linearly interpolating between the two closest points.

  27. Upper Boundary

  28. Lower Boundary

  29. Background/Ink Transitions • A capture of the inner structure of a word is missing. • Records for every image column, the number of transitions from the background to ink pixels: • Determined by threshold. • nbit(I, c).

  30. Experimental Results • Data sets and processing • Results

  31. Data Sets And Processing • conducted on two test sets of different quality • Acceptable quality (set 1). • Very degraded quality (set 2). • Divide the test to four sets: • 15 images in test set 1. • Entire test set 1. • 32 images in test set 2. • Entire test set 2.

  32. Test Sests

  33. Results • SC • Shape context matching. • XOR • The images are aligned to compensate for shear and scale changes and then a difference image is computed. • EDM • Euclidean distance map. Larger regions are weighted more heavily.

  34. Results

  35. Summary & Conclusions • DTW approach perform better than a number of other techniques. • Accuracy. • Speed. • The future work will focus on improvements in speed and accuracy. • Pruning. • Optimizations in DTW.

More Related