120 likes | 270 Views
Felix Zhang Period 5 2007-2008 Thomas Jefferson High School for Science and Technology Computer Systems Research Lab. Development of a German-English Translator. Summary of Quarter 2. NP Chunking Lemmatization Dictionary Lookup Inflection Noun-verb agreement. Scope for this quarter.
E N D
Felix Zhang Period 5 2007-2008 Thomas Jefferson High School for Science and Technology Computer Systems Research Lab Development of a German-English Translator
Summary of Quarter 2 • NP Chunking • Lemmatization • Dictionary Lookup • Inflection • Noun-verb agreement
Scope for this quarter • Focus less on statistical methods • Get rudimentary grammar system working • Fix all the bugs I’ve made since September
New and Modified Components • More info stored in NP chunking • Better noun-verb agreement • Grammar • Element Assignment • Priority Number Assignment
Noun-verb agreement • Simple method to eliminate more ambiguities def eliminateother(attribs, sub, closest): for x in attribs: if x[0][1] == "nou" and x != sub: for y in x[1]: if y[0]== "nom": attribs[attribs.index(x)][1].remove(y) return attribs
Noun phrase chunking • Now used for English sentences • Stores more info for later methods • “the man make the children” • NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]]
Element Assignment • Based on linguistic information • If case is nominative, chunk is subject • If accusative, chunk is direct object • [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']]
Priority Assignment • Each sentence element is assigned priority number • Based on position in English sentence • Assignments: • sub 1 • mverb 2 • auxverb 3 • iobj 4 • dobj 5 • Sort by number for English grammar
Full run of program input: “den Mann machen die kleinen Kinder” The small children make the man fzhang@ltsp1 ~/research $ python proj.py Part of speech tags: [['den', 'art'], ['Mann', 'nou'], ['machen', 'ver'], ['die', 'art'], ['kleinen', 'adj'], ['Kinder', 'nou']] Morphological analysis: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['1', 'pl'], ['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl'], ['akk', 'pl']]]] Disambiguated after noun-verb agreement: [[['Mann', 'nou'], [['akk', 'mas'], ['dat', 'pl']]], [['machen', 'ver'], [['3', 'pl'], 'pres']], [['kleinen', 'adj'], [['nom', 'pl'], ['akk', 'pl']]], [['Kinder', 'nou'], [['nom', 'pl']]]] Lemmatized: [['Mann', ['Mann', 'Man']], ['machen', ['machen']], ['kleinen', ['klein']], ['Kinder', ['Kind']]] Root translated: [['den', 'the'], ['Mann', 'man'], ['machen', 'make'], ['die', 'the'], ['kleinen', 'small'], ['Kinder', 'child']] NP Chunked English: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]]], ['make', 'ver', [['3', 'pl'], 'pres']], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]]]] Inflected (only works before chunking): ['the', 'the'] ['man', ['akk', 'mas'], 'man'] ['man', ['dat', 'pl'], 'mans'] ['make', ['3', 'pl'], 'make'] ['the', 'the'] ['small', 'small'] ['child', ['nom', 'pl'], 'childs'] Assigned an element type: [[['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], [['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']] Assigned priority: [['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub']] Rearranged to English structure: [['1', ['the', 'art'], ['small', 'adj'], ['child', 'nou', [['nom', 'pl']]], 'sub'], ['2', 'make', 'ver', [['3', 'pl'], 'pres'], 'mverb'], ['5', ['the', 'art'], ['man', 'nou', [['akk', 'mas'], ['dat', 'pl']]], 'dobj']]
Problems • Ambiguities (again) • One ambiguity can change the entire structure of the sentence • “I gave a horse the hat” vs. “I gave the hat a horse” • Attempt at all permutations possible • User disambiguation
Problems • Inflexible • Grammar can only be rearranged in one specific way • Subject – Main verb – Indirect – Direct – Auxiliary Verb • Does not accommodate for prepositions, conjunctions, etc.
Future research • Implement more statistical methods • Morphological info • Actual translation – bilingual corpus • Create better parse tree – Dependency grammar