140 likes | 325 Views
Richard Goerwitz Information Technology Services Carleton College. An Accentual "Spell Checker" for the Masoretic Text. What are Masoretic “Accents”?. The Masoretic text is written in a script consisting of: Consonants Vowels Cantillation marks or accents The accents function:
E N D
Richard Goerwitz Information Technology Services Carleton College An Accentual "Spell Checker" for the Masoretic Text
What are Masoretic “Accents”? • The Masoretic text is written in a script consisting of: • Consonants • Vowels • Cantillation marks or accents • The accents function: • To denote musical motifs for chant • To highlight phrasal/clausal structure • Accents are useful guides for both linguists and exegetes
The Problem • Despite the accents' utility to both linguists and exegetes... • Few people pay much attention to them • Even fewer can effectively proofread them • As a result: • Publishers find it virtually impossible to produce accentually correct editions of the Hebrew Bible
The Solution • Let publishers continue to employ proofreaders as they have before • But develop advanced error-catching techniques: • Focus on electronic editions • Utilize tractable, cross-platform formats and standards • XML • Unicode, etc. • Develop automated validation/checking tools to minimize human error
An Accentual Grammar • To automate accent checking, we must • Be able to say, formally, what is (or is not) a correctly accented Hebrew verse • In other words, we must construct an accentual grammar • That grammar must • Be simple and tractable • Easily convertible into a practical computer-based program
Convertibility Problem • To be readily convertible to a computer program, a grammar must be • Not only context-free, but also • LR-parsable • The Masoretic accents are not even context free • Still less LR-parsable • How, then, can we possibly convert them to a computer program?
Convertibility Solution • To construct a grammar convertible to a computer progam, we simply cheat • By “cheat” I mean • We use a grammar that is very close to the“real” accentual grammar, but is • Context free • LR-parsable • By doing this we • Make it possible to convert our grammar into a computer-based accent checker
The Trouble with Cheating • The trouble with cheating, of course, is that this sort of move can render a grammar • Capable of accepting valid constructs as invalid (false positives) • Capable of accepting invalid constructs as valid • In fact, though, the cheats needed to render our accentual grammar LR-parsable • Cause it to miss only a few invalid constructs • Do not trigger any (known) false positives
Cheating (Continued) • The cheats allowed into the grammar mainly • Gloss over proximity rules (if accent X is too close to accent Y, convert it to accent Z) • Omit rules based on extra-accentual factors like • Word boundaries • Syllable structure • Cause few errors to be missed • Example of missed error: • Mahpak mistaken for yetiv before pashta (not a huge problem for printed texts, since mahpak/yetiv are visually similar)
Proof of the Concept • The accent-checking technique outlined here works • How do we know this? • Because a proof-of-concept computer program exists • Program is simply called Accents • Available at • http://www.goerwitz.com/software/accents/accents-1.1.4.tar.gz • (Program source code includes full accentual grammar)
Sample Error: Background • Exodus 28:11 • Leningrad MS has a stray dot • Makes revia look like zaqef
Sample Error: Printed Edition • BHS transcribes stray dot • Replaces revia with zaqef
Sample Error: Resolution • Accents parse tree for Exodus 28:1Exodus 28:10 silluq_clause 1 atnach_clause 2 tifcha_clause 3 tevir_clause 4 pazer_phrase pazer 4 tevir_clause 5 geresh_phrase munach telishaqetanna azla geresh 5 tevir_phraseERROR 3 tifcha_phrase mereka tifcha 2 atnach_phrase atnach 1 silluq_clause 2 zaqef_phrase zaqefgadol 2 silluq_clause 3 tifcha_clause 4 tevir_phrase darga tevir 4 tifcha_phrase mereka tifcha 3 silluq_phrase mereka silluq
Conclusion • Masoretic accents tractable (sort of) • Representable with an approximate LR-parsable grammar • Model-able as a computer program • Using such a program, we can • Speed the production of Hebrew Bible texts • Virtually eliminate accentual errors