150 likes | 304 Views
Jason Ji Computer Systems Laboratory 2004-2005. Natural Language Processing: Using Machine Translation in Creation of a German-English Translator. Machine Translation. a field that has been around for decades several methods to solve problem none of them resemble human methods
E N D
Jason Ji Computer Systems Laboratory 2004-2005 Natural Language Processing: Using Machine Translation in Creation of a German-English Translator
Machine Translation • a field that has been around for decades • several methods to solve problem • none of them resemble human methods • this program attempts to use human methods to translate
Direct Approach • original translation strategy • translate each word directly in one-to-one dictionary look-up • then perform some local reordering • doesn't consider semantic information
Indirect Approach • Interlingua and Transfer Approaches • translate from source language to some intermediary, unnatural language, including semantic information, etc • translate from intermediary to the target language • other methods • knowledge based, etc, more complicated, less human-like
Theory • no current translation method is 100% effective • no current translation method closely resembles human approach • humans can be 100% effective translators • therefore, use human approach with machines to have more effective translations?
Overview of Method • separate input string into each word • first look-up: a list that maps each word to its part of speech • second look-up: each part-of-speech-specific list maps each word to its translation and semantic information • past tense forms, irregular conjugations, etc
Development • Assumptions: • article must precede noun • preposition must be followed by anoun • verb must be preceded by a noun • look-up: • find word in list.txt, then redirect to other text files
Development • Assumptions: • article must precede noun • preposition must be followed by anoun • verb must be preceded by a noun • look-up: • find word in list.txt, then redirect to other text files
Development • Assumptions: • article must precede noun • preposition must be followed by anoun • verb must be preceded by a noun • look-up: • find word in list.txt, then redirect to other text files
Development • Line of semantic information chopped up • various subclass objects (Noun, Verb, etc) are created • pronouns created in nominative case • articles created unidentified • verbs created infinitive form
Development • Correct article genders and cases • for article in pos x, check noun in pos x+1 • check case of nearestModifier() • correct verb conjugations • for verb in pos x, check subject in pos x-1 • search in verblist, find weak or strong • weak: follow conjugation pattern; strong: read in conjugations from list
Development • Correct pronoun cases • for pronoun in pos x, check verb or preposition in pos x-1 • append all corrected Strings together and display in text field
Results/Conclusion • I see the dog / Ich sehe den Hund. • Correct for pronoun, present-tense verb conjugation, direct-object case correction • The cats help the dogs / Die Katzen helfen den Hunden. • Correct for nominative pluralizations, verb conjugation, and dative pluralization due to verb
Results/Conclusion • The cats are the dogs / RUNTIME ERRORS • fails with irregular verbs in English, including “to be” • The cats ate the pie / Die Katzen essen die Torte. • Fails with past tense verbs • recognizes a past tense verb, but does not correct
Results/Conclusion • Succeeds in limited goals • not practical or applicable in anything • highly fragile • runtime errors for basically anything that doesn’t follow the exact exact form • inefficient: list.txt with 53 words was 4KB; a list of 1,000,000 words would be 75.5MB