180 likes | 205 Views
Corpus linguistics for translators. Amanda Saksida University of Nova Gorica. ... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“. ...
E N D
Corpus linguistics for translators Amanda Saksida University of Nova Gorica
... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“
... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“ Hedwig Harry Hogwarts Hagrid Quidditch ...
... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“ Hedwig Harry Hogwarts Hagrid Quidditch ... wart hog = Phacochoerus aethiopicus
... He cast a sídeways look at Harry under his bushy eyebrows. „Be grateful if yeh didn´t mention that ter anyone at Hogwarts,“ he said. „I´m – er – not supposed ter do magic, strictly speakin´.“ Hedwig Harry Hogwarts Hagrid Quidditch ... wart hog = Phacochoerus aethiopicus
Course outline • Introductory: what is corpora, hystory, typology, online corpora, • Areas where corpora are being used, • Corpus-based translation studies: interesting examples • Tools for building and usage of corpora
What is corpus • A corpus is a collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language. • Computer corpus: a corpus which is encoded in a standardised and homogeneous way for open-ended retrieval tasks. Its constituent pieces of language are documented as to their origins and provenance. • (Guidelines of the Expert Advisory Group on Language Engineering Standards, 1996) • Big collections of modern texts • Electronic form • Representative for language/dialect • Base for desctiptive studies (not prescriptive!)
Brief hystory of corpus linguistics • 1964: Brown corpus (1 M words) • John Sinclair and the Cobuild-Revolution => Bank of English (470 M), • British National Corpus (100 M) => Other languages: Czec, Hungarian, Croatian, Slovac, …) • Web as corpus: with the digital revolution, more and more texts are available on the net => programs that build corpora using on-line texts (WebBootCat, http://www.sketchengine.co.uk/auth/wbc/mycorp.cgi)
Types of corpora • Kinds of corpora: • Medium: written texts / spoken language • Size: referential corpora / specialized corpora • Time span: synchronic/diachronic corpora • Tagging: lemmatized / POS-tagged corpus • Language: mono- or multilingual corpora: • paralell • comparable • translational
Corpus usage • Lexicography • Descriptive Grammars • Translational tools and studies • Foreign languages learning • Socio-linguistic studies • Language technologies
Keywords • Concordance • KWIC (Keyword in Context) • Type / Token • Tag / Lemma • Collocation
What can a corpus tell us? • Word frequency • How frequent a word / word form is (copared to other words)? • Lexical information • Which word frequently coocur? • Which affixes can a word have? • Syntactical information • In which syntactical structures can a word occur? • Semantical information • What are the possible meanings of a word? • Pragmatic information • In which texts can we find a word? What stylistic inforamtion does a word or it's context bear? Does the usage of a word stagnate, is the frequency increasing or decreasing?
What can a corpus tell us? • Translational studies: • Parallel corpus studies can reveal characteristics of translated texts, such as tendenciestowards explicitness and avoidance of repetition. • Comparison between the translation part of the corpus and a corpus of texts ofthe same genre, written in the target language for the translation corpus, reveals atendency towards what we might call the Eliza Doolittle phenomenon: the translatedtexts, more than the texts in the control corpus, tend to contain those TLphrases, structures, and so on, which, from a comparative point of view, seemparticularly characteristic of the TL.(Malmkjaer 1996)
Some of the online corpora • British National Corpus • http://www.natcorp.ox.ac.uk/ • http://view.byu.edu • Bank of English • http://www.collins.co.uk/Corpus/CorpusSearch.aspx • CORIS • http://corpus.cilta.unibo.it:8080/DEMOCORISCorpQuery.html • FidaPLUS: • www.fidaplus.net • Good link: • http://devoted.to/corpora
Tools for translating • Sentence alignment: • TRADOS WinAlign • ATRIL DejaVu • Vanilla Aligner (unix/linux) • Concordances • Wordsmith Tools (www.lexically.net) • Sketch Engine (http://www.sketchengine.co.uk) • MonoConc/ParaConc (www.athel.com) • aConCorde - gut für Arabisch (http://www.comp.leeds.ac.uk/andyr/software/aConCorde/) • CQP (ims.uni-stuttgart.de) • Manatee / Bonito (www.textforge.cz)
Corpus linguistics in Turkey • Kemal Oflazer: http://www.andrew.cmu.edu/user/ko/ • Informatics Institute corpus: http://www.ii.metu.edu.tr/~corpus/