260 likes | 439 Views
The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project. Adam Kilgarriff Lexical Computing Ltd http://www.sketchengine.co.uk. The Cambridge Learner Corpus, English Profile, the Sketch Engine, “ freely available ” , HOO, DANTE and the Kelly Project.
E N D
The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd http://www.sketchengine.co.uk
The Cambridge Learner Corpus, English Profile, the Sketch Engine,“freely available”, HOO, DANTE and the Kelly Project Adam Kilgarriff Lexical Computing Ltd http://www.sketchengine.co.uk
Cambridge Learner Corpus (CLC) • Since 1993 • Nearly as old as CECL • Leading resource (like ICLE) • CUP and Cambridge ESOL • For better dictionaries, ELT courses, tests • Material: all from exams (levels A1-C2) • 45m words; 22m error-tagged • 200,000 scripts, 138 L1s, 203 nationalities
English Profile • From 2006 • Cambridge Univ, Univ Press, ESOL (+ others) • Goal • for each CEFR level, find characteristic lexis and grammar • Main resource: CLC • Talk on Thursday • Theodora Alexopolou, Helen Yannakoudakis
Sketch Engine • Leading corpus tool • Word sketches • One-page summaries of a word’s grammatical and collocational behaviour • In use at OUP, CUP, Collins, Macmillan, INL … • 42 languages • Over 150 corpora • Since May including CHILDES: demo • Since last year including CLC
Error-coded corpus • Challenge • Intuitive to search for x • anywhere • only where it is part of an error • only where it is part of a correction where x can be a word, phrase, grammar pattern … Requirement for CLC in Sketch Engine
Sample text • We will only use those informations to take part of our guest survey
Error-coded corpora in SkE • demo
freely available Free (MED online) Sense 1: not costing anything Sense 4: not limited by rules … anyone can get hold of it??
freely available Free (MED online) Sense 1: not costing anything Sense 4: not limited by rules … anyone can get hold of it?? Available To download onto your com To use
Non-geeks • Access is important, not download • Web is beautiful
HOO / HOO+ • Helping Our Own • HOO: English-NNS NLP researchers • Developer = user: motivation • Shared task/competitive evaluation • Organisers define task and prepare ‘gold standard’ • Teams participate by running their software over test data • Six teams (incl Tübingen), workshop end Sept
HOO+ (2012) • Probably • English: learner data from CLC • Other languages? • Tasks • Essay scoring • Determiner, preposition errors • ? • http://www.clt.mq.edu.au/research/projects/hoo/
DANTE Highlights of English lexicography
DANTE http://webdante.com Flyers
The KELLY Project • EU Lifelong Learning Project • Word cards • 9 languages • Arabic Chinese English Greek Italian Norwegian Polish Russian Swedish • All 36 pairs • Words the learner should know (at A1 … C2) • Partners • Stockholm Univ, Gotheburg Univ, Adam Mickiewicz Univ, ILSP Athens, CNR Pisa, Oslo Univ, Leeds Univ, Keewords A/S, Lexical Computing Ltd
Interesting question • How close to purely corpus-based can a pedagogic list be?
Method • Take a general corpus • Count • Review, add, delete using other lists and corpora • Translate (72 directed-lg-pairs) • Words not in source list which occur in translations: • Review source list • http://kelly.sketchengine.co.uk
Symmatrical pairs: <x,y> and <y,x> • Cliques: • For x, y, z, … all pairs are symmetrical • 9-language cliques (English members) • hospital library music sun theory