280 likes | 425 Views
Birmingham Corpus Linguistics Conference 2007. Spoken multimedia corpora for pedagogical purposes. Sabine Braun (University of Surrey) Pascual Pérez-Paredes (Universidad de Murcia) Ylva Berglund (Oxford University). Introduction.
Birmingham Corpus Linguistics Conference 2007 Spoken multimedia corpora for pedagogical purposes Sabine Braun (University of Surrey) Pascual Pérez-Paredes (Universidad de Murcia) Ylva Berglund (Oxford University)
Introduction • The usefulness of corpora in language pedagogy is widely recognised. • But there is a need for pedagogically relevant corpora, reflected e.g. in initiatives to create 'ad-hoc' corpora in pedagogical contexts. • The creation of pedagogically relevant corpora raises challenges for corpus design. • Past and current initiatives have largely focussed on written corpora; spoken discourse is becoming more important in pedagogical contexts. • The creation of pedagogically relevant spokencorpora raises additional challenges for corpus design.
CORPUS DESIGNTraditional reference corpora (content, size, data format,transcription, annotation, query) CORPUS EXPLOITATIONData-Driven Learning (focus on non-linear reading: concordances and co-texts) The challenges (1) • Corpora contain textual records of discourse; their interpretation requires (re-)contextualisation. • Learners may have difficulties analysing corpus data; they require pedagogical mediation. • Pedagogical corpus uses differ from linguistic description; this requires e.g. pedagogically motivated query options. • Corpora need to be integrated with curricula; this requires e.g. complementarity of content and effective delivery. Do not fully support pedagogical requirements.
CORPUS DESIGNTraditionally: representation in written format CORPUS EXPLOITATIONWork with text-only data and e.g. conversational markup The challenges (2) • Spoken discourse is more dependent on shared physical contexts. • It is adjusted to aural and online perception (e.g. chunking) • It is affected by limitations of processing capacity (false starts, repair). • It is marked by accents. • It is multimodal. Again, this does not fully support pedagogical requirements.
Requirements • Format: multimedia to retain multimodal character of spoken language • Content: complementary with curriculum topics, more coherence than in traditional corpora • Pedagogically motivated transcription, annotation and alignment (transcript-video) • Combination of query methods: text-based exploration and application of corpus techniques • Pedagogical enrichment of corpora with complementary resources (e.g. exercises, explanations) • Effective delivery of corpora and additional resources to learners/teachers
ELISA Professional English Accounts of professional life Different varieties SACODEYL 7 European languages Youth language corpora Speakers 13-15 and 16-18 Corpus creation (1) • Examples: ELISA and SACODEYL • Interview format • Video clips with transcript • Communicatively relevant topics, e.g. in SACODEYL topics outlined in the Common European Framework • Elicitation process: briefing informants and prompting them during the interview, ensuring naturally flowing discourse
Corpus creation (1) Example of topics in SACODEYL
Corpus creation (2) CONTINUUM RAW, ORTHOGRAPHIC TRANSCRIPTION – ANNOTATED CORPORA Transcription TEI-compliant corpora Markup Pedagogic annotation XML files
Corpus creation (2) Transcription SACODEYL TRANSCRIPTOR TEI-compliant corpora Markup SACODEYL ANNOTATOR Pedagogic annotation XML files
Corpus creation (2) Language:ES MediaFileName:ES02.avi Participants: person:Chico name: role: Entrevistado sex: Hombre age: 16 description: person: E name: Andrés Mercader Rodríguez role: Entrevistador sex: Hombre age: 32 description: [/METADATA] [METADATA] Title: La Unión Europea une a los ciudadanos Date Recording:2006-11-05 Date Transcription:2007-02-02 Locale:I.E.S. Floridablanca,Murcia, España Principal Investigator: Pascual Perez-Paredes Researcher:Pascual Perez-Paredes Transcriber: Encarnación Tornero Valero Editor: Autority: SACODEYL Project ID:
Corpus query • Query options will support text- and corpus-based exploration and include e.g. • Easy access to entire interviews • A topic index supporting the analysis of similar sections across interviews ("topic concordances") • Other indices based on the annotation categories • Ready-made data (e.g. frequency lists of each interview; selective concordances) • A concordancer for extended/advanced search; adapted to pedagogical requirements
Pedagogical enrichment • The corpora will be enriched with prototypical learning activities. • These will focus on one interview section or one interview as a whole or sections across interviews… • They will include e.g. • linguistic and cultural explanations and exercises(form-focussed as well as communication-oriented), • (listening) comprehension and production tasks, • explorative tasks (concordance-based as well as interview-based). • Use of authoring tool Telos Language Partner to create learning packages with ranges of activities.
Corpus delivery • Effective delivery as a further prerequisite for integration into curriculum • In SACODEYL, use of Moodle learning platform, giving access to: • Corpora (query interfaces) • Resources created in the project (different types of learning activities) • Resources created by future corpus users
Summary • Method outlined is transferable to other pedagogical contexts, topics, languages • Method helps to use corpora more efficiently in pedagogical contexts – from sporadically used resource to systematic exploitation • Corpus creation complies with standards to facilitate reuse of corpora for other contexts (research)
Contact Sabine Braun: s.braun@surrey.ac.uk Pascual Pérez-Paredes: sacodeyl@um.es Ylva Berglund: ylva.berglund@oucs.ox.ac.uk And visit our poster session… As well as our websites: www.um.es/sacodeyl www.corpora4learning.net/elisa