220 likes | 404 Views
Multi-lingual Mathematical Document Recognition by InftyReader. Oct. 20, 2008 , Milano @Science Conference. M. Suzuki Kyushu University InftyProject (Research group) Science Accessibility Net (NPO). InftyReader. InftyReader : OCR software to recognize mathematical documents.
E N D
Multi-lingual Mathematical Document Recognition by InftyReader Oct. 20, 2008 , Milano @Science Conference M. Suzuki Kyushu University InftyProject (Research group) Science Accessibility Net (NPO)
InftyReader • InftyReader : • OCR software to recognize mathematical documents. • Output XML(IML, MathML), LaTeX, MS Word etc. http://www.inftyproject.org
Image → Accessible Doc Scanned Image, PDF InftyReader IML ChattyInfty Speech Output Braille http://www.inftyproject.org
Image → Accessible Doc Scanned Image, PDF InftyReader IML LaTeX, MathML, MS Word, etc ChattyInfty Other Developpers Speech Output Braille DAISY, Braille http://www.inftyproject.org
Image → Accessible Doc Scanned Image, PDF InftyReader IML LaTeX, MathML, MS Word, etc ChattyInfty (Infty) Other Developpers Speech Output Braille DAISY, Braille http://www.inftyproject.org
Demo1 • Handbook of Mathematical Formulas and Integrals (A. Jeffrey) • Geometry (pp.13-15) • Series (pp.11-12) • University Textbook • PDF recognition • J. Math. Soc. Japan, Vol.53, pp.485-500 • Partially Colored PDF http://www.inftyproject.org
InftyReader Flow • Layout analysis. • Text areas (including math), figures, tables • Line segmentation http://www.inftyproject.org
InftyReader Flow • Layout analysis. • Text areas (including math), figures, tables • Line segmentation • Recognition of ordinary texts and separation of Math expressions. http://www.inftyproject.org
InftyReader Flow • Layout analysis. • Text areas (including math), figures, tables • Line segmentation • Recognition of ordinary texts and separation of Math expressions. • Recognition of math expressions. http://www.inftyproject.org
InftyReader Flow • Layout analysis. • Text areas (including math), figures, tables • Line segmentation • Recognition of ordinary texts and separation of Math expressions. • Recognition of math expressions. • Output of the results. • Logical structure analysis • Output into various formats. http://www.inftyproject.org
InftyReader Flow • Layout analysis. • Text areas (including math), figures, tables • Line segmentation • Recognition of ordinary texts and separation of Math expressions. • Recognition of math expressions. • Output of the results. • Logical structure analysis • Output into various formats. Current version: Combination of three different OCR engines, Infty + Two commercial OCRs (Toshiba + Media Drive) http://www.inftyproject.org
The new version • For further improvement of ordinary text recognition+ multi-lingua recogniton. • Introduction of ABBY FineReader engine. http://www.inftyproject.org
The new version • For further improvement of ordinary text recognition+ multi-lingua recogniton. • Introduction of ABBY FineReader engine. • Method and effect …. http://www.inftyproject.org
Methods Method 1(Recognition of words) F : N ü h a m a - g u n, E : N i i h a r n a - g u n , I : N i i / ι a m a - g u n , • For further improvement of ordinary text recognition+ multi-lingua recogniton. • Introduction of ABBY FineReader engine. • Method an effect … http://www.inftyproject.org
Methods Method 1. F : N ü h a m a - g u n, E : N i i h a r n a - g u n , I : N i i / ι a m a - g u n , • For further improvement of ordinary text recognition+ multi-lingua recogniton. • Introduction of ABBY FineReader engine. • Method an effect … http://www.inftyproject.org
Methods Method 1. Result N i i h a m a g u n , F : N ü h a m a - g u n, E : N i i h a r n a - g u n , I : N i i / ι a m a - g u n , • For further improvement of ordinary text recognition+ multi-lingua recogniton. • Introduction of ABBY FineReader engine. • Method an effect … http://www.inftyproject.org
Methods Method 2 (Separation of math expressions) • For further improvement of ordinary text recognition+ multi-lingua recogniton. • Introduction of ABBY FineReader engine. • Method an effect … http://www.inftyproject.org
Methods Method 2 (Separation of math expressions) • For further improvement of ordinary text recognition+ multi-lingua recogniton. • Introduction of ABBY FineReader engine. • Method an effect … http://www.inftyproject.org
Methods Method 2 (Separation of math expressions) • For further improvement of ordinary text recognition+ multi-lingua recogniton. • Introduction of ABBY FineReader engine. • Method an effect … http://www.inftyproject.org
The new version • For further improvement of ordinary text recognition+ multi-lingua recogniton. • Introduction of ABBY FineReader engine. • Method and effect …. • Demo …. http://www.inftyproject.org
The new version • For further improvement of ordinary text recognition+ multi-lingua recogniton. • Introduction of ABBY FineReader engine. • Method and effect …. • Demo …. • We need further dictionaries of different languages.Dictionaries of short wordsof length < 5 are sufficient. http://www.inftyproject.org
“InftyReader”OCR for mathematical documents Thanks you! InftyProject: http://www.inftyproject.org/ sAccessNet: http://www.sciaccess.net/ http://www.inftyproject.org