460 likes | 730 Views
The learner as corpus designer. Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it. … or the art of fruit salads. Learner uses of corpora. Form-focussed (data-driven learning) Meaning-focussed (learning the culture) Skill-focussed (reading practice)
E N D
The learner as corpus designer Guy Aston SSLMIT, University of Bolognaguy@sslmit.unibo.it
Learner uses of corpora • Form-focussed (data-driven learning) • Meaning-focussed (learning the culture) • Skill-focussed (reading practice) • Browsing environment (serendipity) • Reference tool for other tasks (reading/writing aid)
Why make your own corpus? • You can devise your own recipe • You know what’s in it • You learn how to do it • Can be fun • Can provide practice in language use
Devising your own recipe • Only the text-type(s) you want • Only the texts you want • The quantity you want … small and specialised is beautiful
You know what’s in it • Top-down knowledge of corpus • Top-down knowledge of texts
You learn how to do it • Can be a useful skill for many language workers • technical writers • translators • teachers • Can make you a more critical corpus user
It can be fun • Provides a challenge • Gives sense of achievement/satisfaction Practice in language use • Design/construction/evaluation of corpora can be communicative activities
Why use standard corpora? • Less effort • More reliable • Better packaging • You don’t want to learn to make your own
More reliable • if it’s well designed • if it fits your needs
Metatextual information Annotation Corpus-specific software Better packaging
A compromise strategy: make your own subcorpus • assemble using the pre-prepared ingredients of a larger corpus or in other words… go to a (fruit) salad bar
You have a choice of • text-types • individual texts • selection by pre-determined criteria • selection by hand • … or both
You know what went in • so top-down processing is easier Little effort • in comparison with making your own
Good packaging • Metatextual information • Linguistic annotation • Can use software designed for full corpus • Indexed
You get to learn • what are(n’t) useful subcorpora • what are(n’t) useful design criteria • how to do it
It can be fun • challenge / achievement / satisfaction You can talk about its • design / construction / evaluation
Talking about fruit salad BNC Sampler: KC2
Talking about fruit salad BNC Sampler: KC2
And now to details … the Sampler awaits!
You can create subcorpora of • specific corpus texts • texts containing solutions to a query • encoded categories of texts • your own categories of texts • and compare them with • other subcorpora • the full corpus
Choosing specific texts Text analysis: selecting
Viewing the index Viewing the index
A bad language subcorpus: texts containing solutions to a query
collocates of f.*k.* collocates of f_ words
collocates of oh collocates of oh
Making subcorpora using encoded categories • ‘context-governed’ spoken texts • - monologue: 17 texts • - dialogue: 29 texts
Monologue vs Dialogue • More frequent in M* • could • had • he • know • their • were • when • who • your • More frequent in D* • 'll • 'm • any • no • pounds • right • yeah • yes *ranked 20+ positions higher in first 100 words
Investigating the differences • no occurrences of all right in monologue • when you’re / you’ll / you’d / you’ve is more common in monologue than whenwe’re / we’ll / we’d / we’ve;vice-versa in dialogue
you and we you we Monologue 4253 2014 Dialogue 6635 4949
Subcorpora using your own categories David Lee’s book genres • academic non-fiction (13 texts) • non-academic non-fiction (15 texts) • prose fiction (13 texts)
Distinctive -ly adverbs of: • academic non-fiction • accordingly, essentially, eventually, largely, namely, notably, respectively, surprisingly • non-academic non-fiction • effectively, merely, normally, obviously, possibly, specially • prose fiction • carefully, quietly, slightly, slowly, softly, surely, truly
largely (academic non-fict) largely (academic non-fiction)
Working with subcorpora can allow • study/comparison of forms/meanings in particular texts/text-types • better-focussed reading practice • more appropriate reference tools for particular tasks • more focussed browsing
Subcorpora • may not be representative (but nor is most language learning data) • are good for forming hypotheses to be tested more widely • will allow more interesting uses when extracted from a larger corpus
Making your own provides • better preparation and motivation for corpus use • more critical awareness • lots to talk about