1 / 14

An Accentual "Spell Checker" for the Masoretic Text

Richard Goerwitz Information Technology Services Carleton College. An Accentual "Spell Checker" for the Masoretic Text. What are Masoretic “Accents”?. The Masoretic text is written in a script consisting of: Consonants Vowels Cantillation marks or accents The accents function:

cisco
Download Presentation

An Accentual "Spell Checker" for the Masoretic Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Richard Goerwitz Information Technology Services Carleton College An Accentual "Spell Checker" for the Masoretic Text

  2. What are Masoretic “Accents”? • The Masoretic text is written in a script consisting of: • Consonants • Vowels • Cantillation marks or accents • The accents function: • To denote musical motifs for chant • To highlight phrasal/clausal structure • Accents are useful guides for both linguists and exegetes

  3. The Problem • Despite the accents' utility to both linguists and exegetes... • Few people pay much attention to them • Even fewer can effectively proofread them • As a result: • Publishers find it virtually impossible to produce accentually correct editions of the Hebrew Bible

  4. The Solution • Let publishers continue to employ proofreaders as they have before • But develop advanced error-catching techniques: • Focus on electronic editions • Utilize tractable, cross-platform formats and standards • XML • Unicode, etc. • Develop automated validation/checking tools to minimize human error

  5. An Accentual Grammar • To automate accent checking, we must • Be able to say, formally, what is (or is not) a correctly accented Hebrew verse • In other words, we must construct an accentual grammar • That grammar must • Be simple and tractable • Easily convertible into a practical computer-based program

  6. Convertibility Problem • To be readily convertible to a computer program, a grammar must be • Not only context-free, but also • LR-parsable • The Masoretic accents are not even context free • Still less LR-parsable • How, then, can we possibly convert them to a computer program?

  7. Convertibility Solution • To construct a grammar convertible to a computer progam, we simply cheat • By “cheat” I mean • We use a grammar that is very close to the“real” accentual grammar, but is • Context free • LR-parsable • By doing this we • Make it possible to convert our grammar into a computer-based accent checker

  8. The Trouble with Cheating • The trouble with cheating, of course, is that this sort of move can render a grammar • Capable of accepting valid constructs as invalid (false positives) • Capable of accepting invalid constructs as valid • In fact, though, the cheats needed to render our accentual grammar LR-parsable • Cause it to miss only a few invalid constructs • Do not trigger any (known) false positives

  9. Cheating (Continued) • The cheats allowed into the grammar mainly • Gloss over proximity rules (if accent X is too close to accent Y, convert it to accent Z) • Omit rules based on extra-accentual factors like • Word boundaries • Syllable structure • Cause few errors to be missed • Example of missed error: • Mahpak mistaken for yetiv before pashta (not a huge problem for printed texts, since mahpak/yetiv are visually similar)

  10. Proof of the Concept • The accent-checking technique outlined here works • How do we know this? • Because a proof-of-concept computer program exists • Program is simply called Accents • Available at • http://www.goerwitz.com/software/accents/accents-1.1.4.tar.gz • (Program source code includes full accentual grammar)

  11. Sample Error: Background • Exodus 28:11 • Leningrad MS has a stray dot • Makes revia look like zaqef

  12. Sample Error: Printed Edition • BHS transcribes stray dot • Replaces revia with zaqef

  13. Sample Error: Resolution • Accents parse tree for Exodus 28:1Exodus 28:10 silluq_clause 1 atnach_clause 2 tifcha_clause 3 tevir_clause 4 pazer_phrase pazer 4 tevir_clause 5 geresh_phrase munach telishaqetanna azla geresh 5 tevir_phraseERROR 3 tifcha_phrase mereka tifcha 2 atnach_phrase atnach 1 silluq_clause 2 zaqef_phrase zaqefgadol 2 silluq_clause 3 tifcha_clause 4 tevir_phrase darga tevir 4 tifcha_phrase mereka tifcha 3 silluq_phrase mereka silluq

  14. Conclusion • Masoretic accents tractable (sort of) • Representable with an approximate LR-parsable grammar • Model-able as a computer program • Using such a program, we can • Speed the production of Hebrew Bible texts • Virtually eliminate accentual errors

More Related