200 likes | 331 Views
Using corpora in translation studies. What is a corpus?*. A corpus is defined in terms of f orm purpose The word corpus is used to describe a collection of examples of language collected for linguistic study .
E N D
Whatis a corpus?* A corpus isdefined in termsof • form • purpose The word corpusisusedtodescribe a collectionofexamplesoflanguagecollectedforlinguisticstudy. It can alsodescribecollectionsoftextsstored and accessedelectronically. (Hunston:2002). Corpus planning and design isfunctionalto some linguisticpurpose. Itis on thisbasisthattexts are selected and stored, so thatthey can bestudiedquantitatively and qualitatively. *Ref. Text: Hunston S. Corpora in AppliedLinguistics 2002
What are corporausedfor? • Corpora are oftenusedforlanguageteaching and learning. Theygive information abouthow a languageworks. • Theyalso help calculate the relative frequencyofdifferentfeatures. • Exploringcorpora can help studentstoobservenuancesofusage and tomakecomparisonsbetweenlanguages. • Corpora are alsousedto investigate cultural attitudesexpressedthroughlanguage. • NB a corpus willnotgive information aboutwhethersomethingispossible or not, onlywhetheritisfrequent or not!
Usingcorpora in translation • Corpora are alsoused in translation. • Comparablecorporaallowto compare the useofapparentequivalents • Parallelcorporaallowtoseehowwords and phraseshavebeentranslated in the past. • Generalcorpora can beusedtoestablishnormoffrequency and usage.
What can a corpus do? • Corpus access software isusedtorearrange the information whichhasbeenstored so thatobservationsofvariouskinds can bemade. • Itisnot the corpus whichgivesnew information aboutlanguage. Itis the software whichgivesnewperspectives on whatisalreadyfamiliar. • Software packagesprocess data showing: • frequency, • phraseology • collocation.
Frequency • Corpus processing allowscomparisonsofwords in termsoffrequencylists. • Quiteobviously, grammarwords are more frequentthanlexicalwords. Thatexplainswhythey are found top of the list. • Frequencylists can beusefulforidentifyingdifferencesbetween the corpora. Butcomparisons can bemadeonlyif the corpora are comparable, i.e. iftheirlengthisapproximately the same.
Concordance • The mostfrequent way toaccess a corpus isthrough a concordancingprogram. • Concordancelinesbringtogetherinstancesofuseofwords or phrases, so thatregularities in use can beobserved. • Concordancesalso help tounderstandhownouns or adjectives are used
Collocation • Collocationis the tendencyofwordstoco-occur. • The collocatesof a given word are thosewordswhichoftenoccur in conjunction • Collocation can indicate pairsoflexicalitems, or the associationbetween a lexical word and itsfrequentgrammaticalenvironment. In the latter case, the termusediscolligation.
Typesofcorpora • A corpus isdesignedfor a particularpurpose. Consequently, the typeof corpus depends on itspurpose: • Specialized corpus • General corpus • Comparablecorpora • Parallelcorpora • Learner corpus • Historical or diachronic corpus • Monitor corpus
Specialized corpus: a corpus oftextsof a particulartype (editorials, academicarticles, lectures, essays, etc.). Specializedcorporareflect the typeoflanguage a researcherwantstoexplore. Youmayalsorestrict the corpus to a time frame, to a social setting, to a giventopic. • General corpus: is a corpus oftextsofmanytypes, ofwritten or spokenlanguage, or ofboth. A general corpus isusuallymuchlargerthan a specialized corpus. Sinceit can beusedto produce referencematerialsitissometimescalled a reference corpus.
Comparablecorpora: two or more corpora in differentlanguages, or in differentvarietiesof a language. They are designedtocontain the sameproportionoftexts (i.e. newspapertexts, essays, novels, conversations, etc.). They can beusedbytranslators and learnerstoidentifydifferences and equivalences in eachlanguage. • Parallelcorpora: two or more corpora in differentlanguages, containingtranslatedtexts, or textsproducedsimultaneously in two or more languages (e.g. EU texts). They can beusedbytranslators and learnerstofindpotentialequivalents in eachlanguage, and to investigate differencesbetweenlanguages.
Learner corpus: a collectionoftextsproducedbylearnersof a language. Itisusedtoidentifydifferencesamonglearners, frequency and typeofmistakes, etc. • Historical or diachronic corpus: a corpus oftextsfromdifferentperiodsoftime. Ithelpsto trace the developmentof a languageovertime. • Monitor corpus: a corpus usedtotrackcurrentchanges in a language. Itrapidlyincreases in size, sinceitisaddedannually, monthly, daily, etc. The proportionof text typeshastoremainconstant, so thateachyeariscomparablewitheveryother.
Key terms • Type • Token • Hapax • Lemma • Word-form • Tag • Parse • Annotate