290 likes | 467 Views
Contrasting Polish and English Derivational Groups. Karolina Tymowicz . based on Jadacka, H. Rzeczeownik polski jako baza derywacyjna,WN-PWN 1995 independent contrastive study of 540 Polish-English pairs of derivations . November 28 th 2000. Outline. Defining terms: Derivational group
E N D
Contrasting Polish and English Derivational Groups Karolina Tymowicz • based on • Jadacka, H. Rzeczeownik polski jako baza derywacyjna,WN-PWN 1995 • independent contrastive study of 540 Polish-English pairs of derivations November 28th 2000
Outline • Defining terms: • Derivational group • Derivational base • Affixes • Similarity of and within derivational groups • Procedure of comparison • Conclusions
Derivational group • A well-ordered system constructed around an underived entry word concentrating all the derivatives connected with it by means of direct or indirect process of derivation • a hierarchical structure in which each element functions as a link between other derivatives and the BASE
Derivational base • The item to which an affix is added to derive a new word-form • the word-forms consisting of the derivational base and an affix are called DERIVATIVES • e.g. STYLE - STYLIZE - STYLIZER • e.g. CENTRE - CENTRIC - CENTRICALLY
Affix • a morpheme that is added to a word, and which changes the meaning or function of the word • affixes are bound-forms that can be added: • to the beginning of a word = a prefix, e.g.: unkind • to the end of a word = suffix, e.g.: kindness
Similarity within derivational groups Four kinds of similarities within derivational groups are considered. Three types of translational similarity • translational similarity between morphemes • translational similarity between derivatives • translational similarity between derivational groups and one type of grapho-etymological similarity • graphemic and etymological similarity between bases
degrees of translational similarity between morphemes (incl. bases) def. translational similarity between L1 and L2 morphemesis a degree to which L1 morpheme can correctly be rendered as a corresponding L2 morpheme (i.e. morphemes occupying the same position with respect to the base). • no similarity, e.g. ponad- vs. -less in P. ponad-czasowy, E. time-less) • 1st degree of similarity, e.g. bez- vs. -less in P. bez-głośny, E. voice-less • 2nd degree of similarity, e.g. -ik vs. -er in P. głośn-ik, E. loudspeak-er -czas- vs. time- in P. ponad-czas-owy, E. time-less)
degrees of translational similarity between derivatives def.: a joint translational similarity between all the corresponding morphemes of the Polish and English derivatives e.g. Pol. Eng. za- = a- les’- = forest ać wherebytwo morphemes are corresponding iff they occupy the same position with respect to the base.
degrees of translational similarity between derivative groups similarity between derivational groups is a function of • the grapho-etymological similarity of their bases, • and the translational similarity of all their derivatives.
Degrees of graphemic-etymological similarity between derivationalbases def. Similarity established between two bases with respect to their etymological and graphemic features with the assumption of their translational equivalence • no similarity, e.g. dom vs. house • remote similarity, e.g. brat vs. brother • close similarity, e.g. styl vs. style irrespective of the translational equivalence of their derivatives
Scale of translational similarity between derivatives This scale used here consists of 12 levels of similarity counted from 11 to 0, where 0 stands for the lowest level of similarity and 11 denotes the highest level of similarity. 0 1 2 3 4 5 6 7 8 9 10 11
Treatment of compound derivatives If a single compound derivative of the form “A-B” or “AB” (but not “A B”) has an equivalent in the other language in the form of 2 separate words “C D” then it is included into our classification as long as • C is a direct translation of A and D is a direct translation of B • or C is a direct translation of B and D is a direct translation of A. This convention has been adopted because • Jadacka’s derivational groups contain only derivatives of the form ‘AB’ or ‘A-B’, but no ‘A B’ derivatives • Jadacka’s work constituted the main and most reliable source of derivatives and derivational groups considered in the study.
0 1 2 3 4 5 6 7 8 9 10 11 Scale of similarity 11. P. BASE1 + BASE2 + SUFFIX = E. BASE1 + BASE2 + SUFFIX e.g.: słowo - word słowo-twór-stwo word form-ation 10. E. BASE1 + (BASE2 + SUFFIX) = P. (BASE2 + SUFFIX) + BASE1 e.g.: krew - blood blood-stain-ed poplamio-ny krwią 9. E. BASE1 + BASE2 = P. BASE2 + (BASE1 + SUFFIX) e.g.: głos - voice voice-mail poczta głos-owa Compound derivatives 1
0 1 2 3 4 5 6 7 8 9 10 11 Compound derivatives 2 Scale of similarity 8. P. BASE1 + BASE2 = E. BASE1 + BASE2 e.g.: słowo - word pół-słowo half-word 7. E. BASE1 + BASE2 = P. BASE2 + BASE1 e.g.: styl - style free-style styl wolny
0 1 2 3 4 5 6 7 8 9 10 11 Scale of similarity 6. P. BASE + SUFFIX = E. BASE + SUFFIX e.g.: las - forest les’-nik forest-er P. BASE + SUFFIX + SUFFIX = E. BASE + SUFFIX + SUFFIX e.g.: styl - style styl-ist-yczny styl-ist-ic P. PREFIX + BASE + SUFFIX = E. PREFIX + BASE + SUFFIX e.g.: las - forest wy-les’-anie de-forest-ation P. PREFIX + BASE + SUFFIX + SUFFIX = E. PREFIX + BASE + SUFFIX + SUFFIX e.g.: centrum - centre de-centr-al-izować de-centr-al-ize Single derivatives 1
0 1 2 3 4 5 6 7 8 9 10 11 Scale of similarity 5. P. PREFIX + BASE + SUFFIX = E. BASE + SUFFIX + SUFFIX e.g.: dziecko - child bez-dziet-ność child-less-ness 4. P. PREFIX + BASE + SUFFIX = E. BASE + SUFFIX e.g.: pan - lord wielko-pań-ski lord-ly 3. P. PREFIX + BASE + SUFFIX = E. PREFIX + BASE e.g.: las - forest za-leś-ać a-forest Single derivatives 2
0 1 2 3 4 5 6 7 8 9 10 11 Scale of similarity 2. P. BASE + SUFFIX = E. BASE + ____ e.g.: słowo - word słow-nik word-book P. BASE + SUFFIX = E. BASE e.g.: dziecko - child diec-inka child 1. P. BASE + SUFFIX + SUFFIX = E. _____ + _______ + SUFFIX e.g.: słowo - word słow-nik-arz lexico-graph-er P. BASE + SUFFIX = E. _____ + SUFFIX e.g.: znak - sign znacz-nik mark-er Single derivatives 3
0 1 2 3 4 5 6 7 8 9 10 11 Scale of similarity 0. E. BASE + BASE = P. _____ e.g.: time - czas time-piece zegarek P. BASE + SUFFIX = E. _____ e.g.: kość - bone kos-tka ankle E. PREFIX + BASE = P. _______ e.g.: child - dziecko grand-child wnuk Single derivatives 4
Experiment • 540 Polish-English pairs of derivatives were judged as to their similarity according to the 12-point scale presented above • the translational similarity points for each pair of derivatives obtained for each of the Polish and English bases together with the grapho-etymological similarity between these bases were analysed statistically
Statistical tests applied in the study • in spite of nonnormality of the data the following parametric tests were applied • MANOVA for • for translational similarity between derivatives by • grapho-etymological similarity between the basis these derivatives were obtained from, and • direction of translation • (Polish-English: based on Jadacka ‘95 and Collins Polish-English Electronic Dictionary, • English-Polish: based on Harper-Collins Electronic Dictionary and Collins English-Polish Electronic Dictionary) • Multiple Range Tests for • translational similarity of the derivatives, irrespective of whether they were obtained through Polish-English or English-Polish translation • by grapho-etymological similarity between the Polish and English bases they were derived from • Multiple Range Tests for • translational similarity of the derivatives obtained through Polish-English translation • by grapho-etymological similarity between the Polish and English bases they were derived from • additionally some non-parametric tests were applied • Mann-Whitney W test to compare • medians of the similarity points obtained for the derivatives in Polish-English translation • with the medians of the similarity points obtained for the derivatives in English-Polish translation
Some results: MANOVA • Type III Sums of Squares was used • All F-ratios were based on the residual mean square error. Source Sum of Squares Df Mean Square F-Ratio P-Value A:graph_ethym_sim_betw_bases 590,704 2 295,352 53,53 0,0000 B:direction_of_translation 195,227 1 195,227 35,38 0,0000 RESIDUAL 2957,27 536 5,5173 TOTAL (CORRECTED) 3903,44 539 The P-values test the statistical significance of each of the sources. Since P-values are less than 0,05, these grapho-etymological similarity between bases and the direction of translation have a statistically significant effect on the translational similarity between the derivatives obtained from these bases at the 95,0% confidence level.
Some results: Multiple Range Tests Contrast Difference +/- Limits 0 - 1 0,197742 1,25397 0 - 2 *-2,60124 0,488299 1 - 2 *-2,79898 1,30672 * denotes a statistically significant difference. which means that the derivational groups * of the Polish-English bases that were judged to bear no similarity with respect to their grapho-etymological features, and the derivational groups * of the bases that were judged to be remotely similar with respect to their grapho-etymological features (i.e. 0-1) do not differ significantly with respect to the similarity of the derivatives that constitute derivational groups of each of these basis. on the other hand, groups derived from bases that differed in their etymology and graphemic representation (contrasts 0-2 and 1-2) have significantly different derivatives as far as the translational similarity of these derivatives is concerned.
1 2 5 7 8 Frequency Cumulative % 540 observations = 100%
Applications of the study The results of the study provide insights into the possibility of automatic translation of UNKNOWN L1 derivatives on the basis of • the L2 equivalents of the component morphemesof L1 derivative • the degree of grapho-etymological similarity between the bases of these derivatives
For example: assume • we do not know the equivalent of a derivative leśnik • we can interpret bases even if they are modified by other morphemes (las leś-) • we know the equivalents of the component morphemes: les’- (= las) forest -nik -er • we know the grapho-etymological similarity between the bases (= 0) Hence, we guess with a relatively small certainty that English equivalent of leśnik is forester
Pessimistic scenario for automatic translation of derivatives 0 1 2 3 4 5 6 7 8 9 10 11 Scale of similarity
Optimistic scenario for automatic translation of derivatives 0 1 2 3 4 5 6 7 8 9 10 11 Scale of similarity
Very optimistic scenario for automatic translation of derivatives 0 1 2 3 4 5 6 7 8 9 10 11 Scale of similarity
Conclusions • COMPOSITIONALITY: The meaning of the derivative is a direct function of the meaning of its morphemes in app. 38-56% of cases • Assuming we know the equivalents of all the morphemes of an L1 derivative we have app. 38-56% chance of producing a comprehensible L2 derivative • The grapho-etymological similarity of L1 and L2 bases influences the translational similarity of their derivational groups