1 / 12

Development of a German-English Translator

Felix Zhang Period 5 2007-2008 Thomas Jefferson High School for Science and Technology Computer Systems Research Lab. Development of a German-English Translator. Summary of Quarter 2. NP Chunking Lemmatization Dictionary Lookup Inflection Noun-verb agreement. Scope for this quarter.

dolan
Download Presentation

Development of a German-English Translator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Felix Zhang Period 5 2007-2008 Thomas Jefferson High School for Science and Technology Computer Systems Research Lab Development of a German-English Translator

  2. Summary of Quarter 2 • NP Chunking • Lemmatization • Dictionary Lookup • Inflection • Noun-verb agreement

  3. Scope for this quarter • Focus less on statistical methods • Get rudimentary grammar system working • Fix all the bugs I’ve made since September

  4. New and Modified Components • More info stored in NP chunking • Better noun-verb agreement • Grammar • Element Assignment • Priority Number Assignment

  5. Noun-verb agreement • Simple method to eliminate more ambiguities def eliminateother(attribs, sub, closest): for x in attribs: if x[0][1] == "nou" and x != sub: for y in x[1]: if y[0]== "nom": attribs[attribs.index(x)][1].remove(y)‏ return attribs

  6. Noun phrase chunking • Now used for English sentences • Stores more info for later methods • “the man make the children” • NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]]

  7. Element Assignment • Based on linguistic information • If case is nominative, chunk is subject • If accusative, chunk is direct object • [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']]

  8. Priority Assignment • Each sentence element is assigned priority number • Based on position in English sentence • Assignments: • sub 1 • mverb 2 • auxverb 3 • iobj 4 • dobj 5 • Sort by number for English grammar

  9. Full run of program input: “den Mann machen die kleinen Kinder” The small children make the man fzhang@ltsp1 ~/research $ python proj.py Part of speech tags: [['den', 'art'], ['Mann', 'nou'], ['machen', 'ver'], ['die', 'art'], ['kleinen', 'adj'], ['Kinder', 'nou']] Morphological analysis: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['1', 'pl'], ['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl'], ['akk', 'pl']]]] Disambiguated after noun-verb agreement: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl']]]] Lemmatized: [['Mann', ['Mann', 'Man']], ['machen', ['machen']], ['kleinen', ['klein']], ['Kinder', ['Kind']]] Root translated: [['den', 'the'], ['Mann', 'man'], ['machen', 'make'], ['die', 'the'], ['kleinen', 'small'], ['Kinder', 'child']] NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]] Inflected (only works before chunking): ['the', 'the'] ['man', ['akk', 'mas'], 'man'] ['man', ['dat', 'pl'], 'mans'] ['make', ['3', 'pl'], 'make'] ['the', 'the'] ['small', 'small'] ['child', ['nom', 'pl'], 'childs'] Assigned an element type: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']] Assigned priority: [['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']] Rearranged to English structure: [['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj']]

  10. Problems • Ambiguities (again)‏ • One ambiguity can change the entire structure of the sentence • “I gave a horse the hat” vs. “I gave the hat a horse” • Attempt at all permutations possible • User disambiguation

  11. Problems • Inflexible • Grammar can only be rearranged in one specific way • Subject – Main verb – Indirect – Direct – Auxiliary Verb • Does not accommodate for prepositions, conjunctions, etc.

  12. Future research • Implement more statistical methods • Morphological info • Actual translation – bilingual corpus • Create better parse tree – Dependency grammar

More Related