150 likes | 355 Views
Creation of a Russian-English Translation Program. Karen Shiells. Purpose. Object-oriented approach Interactive machine translation Designed for aid, not independent translation Explore algorithms used in machine translation Identify grammatical obstacles to translation
E N D
Creation of a Russian-English Translation Program Karen Shiells
Purpose • Object-oriented approach • Interactive machine translation • Designed for aid, not independent translation • Explore algorithms used in machine translation • Identify grammatical obstacles to translation • Create a base to expand later
Scope of Study • Machine translation is and will be imperfect • Modern translation uses statistical methods • Project is limited to: • Separating base words from morphological endings • Constructing syntax trees from source text • Generating simple English output from tree • Identifying words already known to the program
Other Research • Part-of-speech tagging: • Uses probability to identify parts of speech • Applied to unknown words and structures • Complex labeling systems, beyond conventional • Translation algorithms: • Massive dictionaries store words and information • Aided by verb categorization • Omit unknown words and translate without • Usually comprehensible, but require human revision
Old Methods • Direct Translation • First method • Rearranges sentences without parsing • Based on rules of transfer for specific languages • Interlingua • From era of international languages • Uses one representation as an intermediary • Intermediary is usually a constructed language • Easier to add language pairs
Syntactic Transfer • Similar to interlingua • Generates syntax tree using specific parser • Rearranges tree to fit target structure • Uses specific generation method to form output • Entire algorithm specific to one language pair • Best quality translations • Relatively new • Not as common in commercial software
Alternative Structures • Valency • Stores number of complements for each word • Type of complements not specified • Occupies less space in dictionary • Phrase-Structure Representation • Most familiar: noun phrase, verb phrase, etc. • Breaks sentence into superstructures • Puts terminal symbols only in leaves • Non-terminal symbols for branches
Dependency Trees • Uses words as nodes, not just leaves • Examples: • Verb dependent on subject • Objects dependent on verb • Adjectives dependent on nouns • Prepositions vary by type of prepositional phrase • Easier to verify agreement between words • Occupies less space
Object Orientation • Object-oriented approach allows more flexibility • Endings, cases, and declensions are classes • Fewer hard-coded rules • Methods for locating dependents are in classes • Modular design allows gradual changes • Changes in lexical analysis do not affect parsing • Changes in dictionary do not affect translation
Verb Typing • Divides verbs into categories, for example: • Transitive • Intransitive • Directional or Non-directional motion • Condenses structure storage • Dictionary stores only type of a verb • Particular structures taken from general • Code can apply to general structures, not specific
Dictionary • Open, save, add, remove, and search functions • Stores: • Russian nominative • English nominatives • Part of speech • Noun/pronoun attributes • Verb types
Translator • Uses transliteration for ease of testing • Can be easily converted to Unicode Cyrillic • Debugging output to terminal window
Results • Subject, verb, direct object translated • Subject is first nominative • Verb matched by gender, number, and person • Direct object is first accusative • Adjectives matched to nouns • Matched by case, number, and gender • Word order not considered • Word order should be accounted for, but aren't • Adjectives to nearest, not matching • Prepositional objects should be nearby
Conclusions • Part-of-speech guessing could be added easily • When a subordinate is not found, add to list • For each unmatched word, prompt user • Allow selection between subordinates not found • Verb typing would be harder, but helpful • Restricting complements makes more precise • More efficient, not searching for all possible • Prepositions could be associated with nouns • Even in inflecting languages, word order matters • Subordinates should be located by proximity • Multiple functions use the same inflections
Bibliography • Allen, James. Natural Language Understanding. New York: Benjamin/Cummings Publishing Company, 1995. • Arnold, Doug, Lorna Balkan, Siety Meijer, R. Lee Humphreys, and Louisa Sandler. Machine Translation: An Introductory Guide. London: NCC Blackwell, 1994. Available Online: http://www.essex.ac.uk/linguistics/clmt/MTbook/PostScript. • Barber, Charles. The English Language: A Historical Introduction. Cambridge: Cambridge University Press, 1993. • Beard, Robert. “Russian: An Interactive On-Line Reference Grammar”. November 1, 2005. Available Online: http://www.alphadictionary.com/rusgrammar/. • Comrie, Bernard, ed. The World's Major Languages. Oxford: Oxford University Press, 1990. • Hutchins, John and Harold Somers. An Introduction to Machine Translation. London: Academic Press, 1992. Available Online: http://ourworld.compuserve.com/hompages/WJHutchins/IntroMT-TOC.htm.