Ancient Greek OCR w ith Gamera and the Google/ Perseus Greek and Latin Collection

Ancient Greek OCR with Gamera and the Google/PerseusGreek and Latin Collection Bruce Robertson, Mount Allison University

ἀλήθειαtruth Ἀλήθεια • ‘Breathing’ marks on vowels at beginning of a word • Accents possible on all vowels

Diversity of Greek Fonts in 19th C.

Other Examples

Greek OCR With Gamera • Dalitz and Brandt provide an experimental framework • I added splitting, grouping, sql output, etc. • Teams of undergraduates making multiple classifiers • Based on families of fonts • Comparing strategies of composite characters, splitting, etc. • Must also train for Latin scripts used • Not yet working on post-processing

Good Results

Systematic Approach to Automated Greek OCR • Remove the curator from the loop – especially important for journals, monographs, etc. • Assign classifier by computation means • Using: • Federico Boschetti’s ground-truth-less Greek text evaluator • Atlantic Computational Excellence Network, Atlantic Canada’s parallel computing network

Process • 160 Greek-heavy texts chosen • Of these, random samples of 10 pages were taken • Each was processed with each of the 20 classifiers made this summer • The result were evaluated and given a ‘Boschetti score’ from 0 – 1

Google/ABBYY Line Splitting

Gamera’s Text Line Finding(bbox_merging)

Replaced with runlength_smearing

Two-step processing

Future Work • Combining and re-optimizing classifiers? • Assign classifier based on Latin text • Is ‘Oxford’, ‘Clarendon’ or ‘Oxonii’ in the first pages of output? • Align with Google’s output, and provide Google with corrected Greek • Implement line-splitting from other OCR engines • Discover badly OCR’d Greek in others’ output • Implement OCR correction frameworks described here

Common Problems • Assessments of pre-processing strategies and tools • Schemas for page description

Thanks • Colleagues in Dynamic Variorum Editions: • Greg Crane at Perseus / Tufts • Brian Fuchs at Imperial College • Federico Boschetti • AceNet, especially tech. support of Sergiy Khan

Ancient Greek OCR w ith Gamera and the Google/ Perseus Greek and Latin Collection

Ancient Greek OCR w ith Gamera and the Google/ Perseus Greek and Latin Collection

Presentation Transcript

The Greek Olympics

Greek Theater

Transitioning into a World of Employment

AKS 32: Ancient Greece

DRG and Code Reconciliation – CDI and HIM Coding Teamwork

ANCIENT GREEK MEDICINE

Greek Facts

Art History

Roman Civilization

Greek and Roman Art

Ancient Egypt

Latin and Greek

Roman Civilization

Latin and Greek roots and affixes

Dates

ROOT WORDS

1. What is Geography? - Greek Definition

Greek Tragedy

The Hellenic Age of Ancient Greece

Mediterranean Society: The Greek Phase

Chapter 11 - Greek Cultural Contributions