1 / 21

In two minds: How to teach translation students to learn from parallel corpora

In two minds: How to teach translation students to learn from parallel corpora. Toma ž Erjavec Department of Intelligent Systems Jožef Stefan Institute tomaz.erjavec@ijs.si Špela Vintar Department of Translation and Interpreting University of Ljubljana spela.vintar@guest.arnes.si.

chessa
Download Presentation

In two minds: How to teach translation students to learn from parallel corpora

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In two minds: How to teach translation students to learn from parallel corpora Tomaž Erjavec Department of Intelligent Systems Jožef Stefan Institute tomaz.erjavec@ijs.si Špela Vintar Department of Translation and Interpreting University of Ljubljana spela.vintar@guest.arnes.si

  2. Overview • The corpus and concordancer • Using the resource to teach students

  3. The IJS-ELAN parallel corpus • EU MLIS project ELAN: IJS • Slovene-English parallel texts • 1 million words, 15 texts • sentence aligned, tokenised • TEI encoded • freely available http://nl.ijs.si/elan/

  4. Example TU <tu lang="sl-en" id="spor.902"> <seg lang="sl"><w type=dig>117.</w> <w>&ccaron;len</w></seg> <seg lang="en"><w>Article</w> <w type=dig>117</w></seg> </tu> <tu lang="en-sl" id="gnpo.303"> <seg lang="en"><w>Memory</w> <w>exhausted</w></seg> <seg lang="sl"><w>zmanjkalo</w> <w>pomnilnika</w></seg> </tu>

  5. Web concordance • IMS CQP backend • CGI Perl interface • Apache server

  6. Queries • Vanilla queries: dog*, *dog • Full regular expressions: “dog.*” • Positional attributes: [num=“dual”] • Expressions over tokens • Constrains on aligned segments

  7. Using the corpus in translator training: Developing corpus literacy • what is a corpus? • what’s in the corpus? • how to find things in the corpus? • how to use the results?

  8. Formulating corpus queries • learning to formalize language • wordform vs. lemma (Slovene!) • using parallel search to filter out unwanted examples

  9. Evaluating the results • critical eye: corpus translations may be false or bad • before relying on quantitative data, consider corpus composition • corpus != dictionary

  10. Types of activities • frontal presentations • group work • individual work - translating with the corpus • seminar assignments

  11. Things to observe • translation (in)equivalence, terminological variety • word-formation strategies • pragmatic/cultural conventions of text types • contrastive analysis • other translation strategies

  12. lokaln* samouprav*  ? kuca:z ustreznim razmerjem med državo in lokalno samoupravo, med središčem države in A society with an appropriate relationship between the state and local government, between the national centre and individual regions. parl:obstajati. Specifične oblike lokalne samouprave so Slovenci poznali pod imenom župa, Specific forms of local self-administration were known to Slovenes by the term župa, which meant one or more villages led by a župan. ecmr:reforme javne uprave, razvoj lokalne samouprave, pa tudi oceno kadrovskih potreb in It is therefore an operative document which, apart from strategic goals, defines the areas of reforms, macro - and micro-economic policy measures, development of judicial system, public administration reform, development of local administration, as well as an estimate of the staff and financing requirements for realisation of those reforms. ekol:okolja33. V ta sklop sodi tudi raven lokalne samouprave s svojimi pristojnostmi na področju This also includes the level of local self-government with its responsibilities in the area of environmental protection, which otherwise are dealt with in a special chapter.

  13. Things to observe • translation (in)equivalence, terminological variety • word-formation strategies • pragmatic/cultural conventions of text types • contrastive analysis • other translation strategies

  14. *hrošč* 11 hroščev 6 hrošču 5 razhroščevanje 5 hroščih 4 hrošče 3 hrošč 2 razhroščevanja 2 razhroščevalnega 2 hrošči 2 Razhroščevanje 1 razhroščujejo 1 razhroščiti 1 razhroščevanju 1 razhroščevalniku 1 razhroščevalniki 1 razhroščevalnik 1 razhroščevalnih 1 razhroščevalne 1 hroščem 1 hroščati 1 hroščat 1 hrošča *bug* 20 bugs 13 bug 9 debugging 8 debug 3 buggers 3 bug-free 2 buggy 2 Debugging 1 tar-bugs@gnu.ai.mit.edu 1 request@bugs.debian.org 1 debuggers 1 debugger 1 bug-wget@gnu.org 1 bug-gnu-utils@gnu.org 1 bug-fixes 1 bug-fileutils@gnu.org

  15. Things to observe • translation (in)equivalence, terminological variety • word-formation strategies • pragmatic/cultural conventions of text types • contrastive analysis • other translation strategies

  16. Ways of translating deontic modality - shall usta: Within its own territory, Slovenia shall protect human rights and fundamental Država na svojem ozemlju varuje človekove pravice in temeljne svoboščine. usta: 11 The official language of Slovenia shall be Slovenian. In those areas where Uradni jezik v Sloveniji je slovenščina. spor: This schedule shall provide for a phasing-out Ta razpored mora predvideti postopno opuščanje tako uvedenih carin, s katerim je treba začeti najkasneje dve leti po uvedbi dajatev, in sicer po enakih letnih stopnjah. orwl: " " Obviously we shall put it off as long as " Nujno jo morava odložiti za tako dolgo, kot moreva. " kuca: a state which shall be fair to all, Je pa v moči vseh državljank in državljanov, da si ustvarijo tako državo, ki bo pravična do vseh, ne glede na njihove poglede na svet, politično prepričanje ali narodno pripadnost. kuca: world. Thus we shall create harmony Tako bomo ustvarjali ravnovesje v sebi, z drugimi in z okoljem.

  17. Things to observe • translation (in)equivalence, terminological variety • word-formation strategies • pragmatic/cultural conventions of text types • contrastive analysis • other translation strategies

  18. Things to observe • translation (in)equivalence, terminological variety • word-formation strategies • pragmatic/cultural conventions of text types • contrastive analysis • other translation strategies

  19. A peek into the log file • ~1,900 different queries since 1999 • L2 search: prevarication, forfeiture, runlevel, kernel • lexical-gap words: bias, retrieve, prepoznavnost • culturally bound words: potica, kozolec • (multiword) terms: legira.* (alloy steel)

More Related