150 likes | 382 Views
Lecture 8. Optical Character Recognition. Qurat-ul-Ain ( Ainie ) Akram Sarmad Hussain Center for language Engineering Al- Khawarizmi Institute of Computer Science University of Engineering and Technology, Lahore, Pakistan. Syllable String Creation using lookup table.
E N D
Lecture 8 Optical Character Recognition Qurat-ul-Ain (Ainie) Akram Sarmad Hussain Center for language Engineering Al-Khawarizmi Institute of Computer Science University of Engineering and Technology, Lahore, Pakistan
Syllable String Creation using lookup table ISSALE 2014
Project Presentation • Front Page • Optical Character Recognition(in English) • Optical Character Recognition(in Your Language) • Document Image • Output of OCR (Recognized Syllable Strings of OCR) • Syllable String Recognition Accuracy(Syllables /Total Syllables*100) • Group Members Name ISSALE 2014
Preprocessing • Line Segmentation • Samples of line segmentation • Line segmentation accuracy results • Samples of incorrect line segmentation • Syllable/Ligature Segmentation • Samples of Syllable/Ligature segmentation • Syllable/Ligature Segmentation Accuracy Results • Samples of incorrect Syllable/Ligature segmentation ISSALE 2014
Pre-processing • Main body and diacritics disambiguation ISSALE 2014
Classification and Recognition • Data Description • 15 Main body Types (DataSet-1) • Training Data (35 Tokens) • Testing Data (15 Tokens) • Image samples • Document Images(DataSet-2) • Testing Data • X Tokens of Y main body Types • X Tokens of Y diacritics Types • Image sample ISSALE 2014
Classification and recognition results • Recognition Results on DataSet-1 using Decision Trees • Main body recognition accuracy • Diacritics recognition accuracy • Recognition Results on DataSet-1 using Tesseract • Main body recognition accuracy • Diacritics recognition accuracy ISSALE 2014
Classification and recognition results • Recognition Results on DataSet-2 using Decision Trees • Main body recognition accuracy • Diacritics recognition accuracy OR • Recognition Results on DataSet-2 using Tesseract • Main body recognition accuracy • Diacritics recognition accuracy ISSALE 2014
Post-processing • Syllable String Creation • Syllable String Recognition Accuracy ISSALE 2014
Output of OCR • Input Document Image • OCR Output ISSALE 2014
Deliverables to submit • Presentation slides • OCR Complete Code • Line segmentation • Syllable segmentation • Recognition of diacritics and main bodies • Syllable string creation using lookup Table • Output.txt file generation • Data Set-1 • Data Set-2 • Tesseract Traineddata file ISSALE 2014
Document Image Creation • Syllable_of_MB1_Samples_1 Syllable_of_MB2_Samples_1 Syllable_of_MB2_Samples_1 Syllable_of_MB3_Samples_1 Syllable_of_MB4_Samples_1 Syllable_of_MB5_Samples_1 ,,, Syllable_of_MB15_Samples_1 • Syllable_of_MB1_Samples_2 Syllable_of_MB2_Samples_2 Syllable_of_MB2_Samples_2 Syllable_of_MB3_Samples_2 Syllable_of_MB4_Samples_2 Syllable_of_MB5_Samples_2 ,,, Syllable_of_MB15_Samples_2 • Syllable_of_MB1_Samples_3 Syllable_of_MB2_Samples_3 Syllable_of_MB2_Samples_3 Syllable_of_MB3_Samples_3 Syllable_of_MB4_Samples_3 Syllable_of_MB5_Samples_3 ,,, Syllable_of_MB15_Samples_3 • Syllable_of_MB1_Samples_4 Syllable_of_MB2_Samples_4 Syllable_of_MB2_Samples_4 Syllable_of_MB3_Samples_4 Syllable_of_MB4_Samples_4 Syllable_of_MB5_Samples_4 ,,, Syllable_of_MB15_Samples_4 • , • , • , • Syllable_of_MB1_Samples_15 Syllable_of_MB2_Samples_15 Syllable_of_MB2_Samples_15 Syllable_of_MB3_Samples_15 Syllable_of_MB4_Samples_15 Syllable_of_MB5_Samples_15 ,,, Syllable_of_MB15_Samples_15 Syllable = MB + Diacritics or Syllable = MB ISSALE 2014
Examples of Document Image ISSALE 2014