1 / 23

Using Word Based Features for Word Clustering

Using Word Based Features for Word Clustering. Department of Electronics and Communications, Faculty of Engineering Cairo University. Research Team: Farhan M. A. Nashwan Prof. Dr. Mohsen A. A. Rashwan. Presented By: Farhan M. A. Nashwan.

blythe
Download Presentation

Using Word Based Features for Word Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Word Based Features for Word Clustering Department of Electronics and Communications, Faculty of Engineering Cairo University Research Team: Farhan M. A. Nashwan Prof. Dr. Mohsen A. A. Rashwan Presented By: Farhan M. A. Nashwan The Thirteenth Conference on Language Engineering 11-12, December 2013

  2. Contribution: • Reduce vocabulary • Increase speed The Thirteenth Conference on Language Engineering 11-12, December 2013

  3. Proposed Approach: Preprocessing and Word segmentor Generated Image word Word Grouping Clustering Groups and Clusters for Holistic Recognition The Thirteenth Conference on Language Engineering 11-12, December 2013

  4. Grouping: • Extraction subwords (PAW) • Extraction dots and diacritics • Used it to select the group The Thirteenth Conference on Language Engineering 11-12, December 2013

  5. Grouping: Generated Image Word Preprocessing and Word segmentor Secondaries separation using contour analysis Secondaries Recognition using SVM Grouping Process Groups The Thirteenth Conference on Language Engineering 11-12, December 2013

  6. Grouping Example: PAW=1 Grouping code (1,21,2) Down Sec.= 2 & 1 Upper Sec.=2 Down Sec.=0 PAW=3 Upper Sec.=2 Grouping Code (3,0, 2) Grouping Code (4,11, 12) Upper Sec.=1 & 2 Down Sec.=1&1 PAW=4 PAW=3 Down Sec.=2 Upper Sec.=2 &1 Grouping Code (3,2, 21) Upper Sec.=2 PAW=2 Down Sec.=0 Grouping Code (2,0, 2) The Thirteenth Conference on Language Engineering 11-12, December 2013

  7. Grouping based on: • PAWs • Down secondaries • Upper secondaries Challenges • Sticking • Sensitive to noise Treatments • Overlapping • SVM The Thirteenth Conference on Language Engineering 11-12, December 2013

  8. Clustering: • Complementary of grouping • LBG algorithm used • Done on groups contain large words • Euclidean distance used Feature Extraction Clustering using LBG Groups Clusters & Groups The Thirteenth Conference on Language Engineering 11-12, December 2013

  9. Features : 1- (ICC): Image centroid and Cells 2- (DCT):Discrete Cosine Transform 3- (BDCT):Block Discrete Cosine Transform 4-(DCT-4B):Discrete Cosine Transform 4-Blocks 5- (BDCT+ICC):Hybrid BDCT with ICC. 6- (ICC+DCT): Hybrid DCT with ICC 7- (ICZ):Image Centroid and Zone 8- (DCT+ICZ): Hybrid DCT and ICZ. 9- (DTW ):Dynamic Time Warping 10- The Moment Invariant Features The Thirteenth Conference on Language Engineering 11-12, December 2013

  10. Results : TABLE 1: CLUSTERING RATE OF SIMPLIFIED ARABIC FONT USING DIFFERENT FEATURES The Thirteenth Conference on Language Engineering 11-12, December 2013

  11. TABLE 2: PROCESSING TIME FOR FEATURE EXTRACTION AND CLUSTERING OF SIMPLIFIED ARABIC FONT USING DIFFERENT FEATURES The Thirteenth Conference on Language Engineering 11-12, December 2013

  12. Conclusion: • based on their holistic features: • Recognition speed increased • unnecessary entries in the vocabulary removed • Total average time of ICC or Moments (0.29 ms) is better than that of other methods. • but the clustering rates are not the best (98.69% for ICC and 82.61% for Moment). • the clustering rate of DCT (99.19%) is the better, but time is the worst (~12 ms). • With two parameters (clustering rate and time) ICC may be a good compromise. The Thirteenth Conference on Language Engineering 11-12, December 2013

  13. Thanks for your attention.. The Thirteenth Conference on Language Engineering 11-12, December 2013

  14. counting the number of black pixels Vertical transitions from black to white horizontal transitions from black to white Go Back The Thirteenth Conference on Language Engineering 11-12, December 2013

  15. DCT .-Applying DCT to the whole word image -The features are extracted in a vector form by using the DCT coefficient set in a zigzag order. -Usually we get the most significant DCT coefficients(160 coef.) Go Back The Thirteenth Conference on Language Engineering 11-12, December 2013

  16. Block Discrete Cosine Transform (BDCT) Apply the DCT transform for each cell Get the average of the differences between all the DCT coefficients Go Back The Thirteenth Conference on Language Engineering 11-12, December 2013

  17. Discrete Cosine Transform 4-Blocks (DCT-4B) 1- Compute the center of gravity of the input image. 2- Divide the word image into 4-parts taking the center of gravity as the origin point. 3- Apply the DCT transform for each Part. 4- Concatenate the features taken from each part to form the feature set of the given word. Go Back The Thirteenth Conference on Language Engineering 11-12, December 2013

  18. Image Centroid and Zone (ICZ) Compute the average distance among these points (in a given zone) and the centroid of the word image Go Back The Thirteenth Conference on Language Engineering 11-12, December 2013

  19. DTW (Dynamic Time Warping) Features. DTW) is an algorithm for measuring similarity between two sequences The distance between two time series x1 . . . xM and y1 . . . yN is D(M,N), that is calculated in a dynamic programming approach using The three types of features are extracted from the binarized images and used in our DTW techniques: X-axis and Y-axis Histogram Profile Profile Features(Upper, Down, Left and Right) Forground/Background Transition Go Back The Thirteenth Conference on Language Engineering 11-12, December 2013

  20. DTW (Dynamic Time Warping) Features. Figure 1: The Four Profiles Features: (A) Left Profile. B) Up (C) Down Profile. D) Right Profile Go Back The Thirteenth Conference on Language Engineering 11-12, December 2013

  21. The Moment Invariant Features Hu moments:Hu defined seven values, computed from central moments through order three Go Back The Thirteenth Conference on Language Engineering 11-12, December 2013

  22. Go Back The Thirteenth Conference on Language Engineering 11-12, December 2013

  23. Moments 12 The moment invariant descriptors are calculated and fed to the feature vector. 16 Go Back The Thirteenth Conference on Language Engineering 11-12, December 2013

More Related