160 likes | 634 Views
Coursework for ISSALE - 2014 Project Demonstration. SINHALA LANGUAGE OCR. Kasun Perera Chamila Liyanage Tharaka Viswakula Laksri Wijerathna. Sinhala Script consists of:. 18 vowels. 40 consonants. Sinhala Script. 18 modifiers other symbols (rakaranshaya, yansaya) Font: Abhaya
E N D
Coursework for ISSALE - 2014 Project Demonstration SINHALA LANGUAGE OCR • Kasun Perera • Chamila Liyanage • Tharaka Viswakula • Laksri Wijerathna
Sinhala Script consists of: 18 vowels 40 consonants
Sinhala Script 18 modifiers other symbols (rakaranshaya, yansaya) Font: Abhaya Font Size :12
Document Image Image document has 16 different character types and 11 samples of each character type.
Line and Main Bodysegmentation • All lines were segmented correctly • No of Lines in input Image -9 • Program Outputs 9 line segments • 100% accuracy • All Main bodies were segmented correctly(No diacritics) • 100% accuracy
Decision Tree Recognition results • Creation of Training(35) and Test data(15) • Decision Tree created using Weka - using Training data • Tested accuracy using Test data Overall accuracy: 70 % Bad recognition Chars 702- නි / 708- ල් / 711- සි / 712- ත්
Tesseract Recognition results Overall accuracy: 93.181%
Complete OCR- DT Method Overall accuracy - 28%
Complete OCR - Tesseract Overall accuracy - 92.8%
Conclusion Test dataset (15) • Tesseract Accuracy- 93% • DT Accuracy- 70% Document Image • Tesseract Accuracy- 92.8% • DT Accuracy- 28%
ස්තුතියි...! (Thank you...!)