100 likes | 289 Views
ZRINKA DUJMOVIĆ University of Zagreb/ETF JRC Workshop: Exploiting parallel corpora in up to 20 Languages Arona, 25-27 September 2005. STATISTICAL ANALYSIS OF NOUN LEMMAS IN THE ITALIAN AND SWISS CONSTITUTION AND THEIR TRANSLATIONS INTO CROATIAN. What?. Constitution of the Republic of Italy
E N D
ZRINKA DUJMOVIĆUniversity of Zagreb/ETFJRC Workshop:Exploiting parallel corpora in up to 20 LanguagesArona, 25-27 September 2005 STATISTICAL ANALYSIS OF NOUN LEMMAS IN THE ITALIAN AND SWISS CONSTITUTION AND THEIR TRANSLATIONS INTO CROATIAN
What? • Constitution of the Republic of Italy (original in Italian + translation in Croatian) – 139 art. + transitory provisions); effective since 1948. • Federal Constitution of the Swiss Confederation (original in Italian + translation in Croatian<It/Germ/Eng.) – 196 art. (+tr. provisions); in force since 2000.
Why? • objective: test terminological consistency between SL & TL • prerequisites: - parallel corpora as rich resources of translation equivalents - small corpora
How? Data processing: • Conversion into the HTML format • Sentence alignment • Lemmatisation (inflectionally rich language!!) • Corpus annotation (POS tagging) • Word alignment • Word frequency lists
Testing terminological consistency of translation 1. HYPOTHESIS 1Italiannoun lemma = 1translation equivalent in Croatian Constitution 2. STATISTICAL TESTING • the minimum least square method • Y = a + bX • Correlation coefficient (R)
Correlation of the most frequent Italian and Croatian noun lemmas in the Federal Constitution of the Swiss Confederation(51) a = 0,0090.039 b = 0.999 0,030 R = 0,978
Correlation of the most frequent Italian and Croatian noun lemmas in the Constitution of the Republic of Italy (31) a = 0,075 0.07305 b = 0,9380.03970 R = 0,975
Deviation from linearity • (a) Accidental (translators’ mistakes) • (b) Justified (still not expected!) • - stillistic differencies e.g.use of relative pronun instead of a noun (1:0) - polysemy (1:2) e. g. It. titolo11 x = Cr.naslov6 x (eng. title) = Cr. vrijednosni papiri1 x ( eng. Securities) - as idiom: 1) a titolo transitorio = privremeno / eng. temporarily; 2) a titolo oneroso = za plaću /eng. against payment
Italian noun lemmas present in Italian and Swiss constitutions = candidates for glossary
Conclusions • the minimum least square method appeared to be adequate for verification of translation • the verification does not have to be carried out on the entire sample, but only on the lemmas with the highest frequency covering at least one order of magnitude • the best candidates for glossary are those lemmas which are repeated with the high frequency in both constitutions