770 likes | 925 Views
An annotated Spanish corpus for Corpus-based CALL in professional contexts. María Sánchez- Tornel Pascual Pérez-Paredes José M. Alcaraz Calero. Authenticating Language Learning: Web Collaboration Meets Pedagogic Corpora February 17-19, 2011. University of Tübingen. OUTLINE :.
E N D
An annotated Spanish corpus for Corpus-based CALL in professional contexts María Sánchez-Tornel Pascual Pérez-Paredes José M. Alcaraz Calero Authenticating Language Learning: Web Collaboration Meets Pedagogic Corpora February 17-19, 2011. University of Tübingen
OUTLINE: • Corpora in FLT • Background • Ourproposal • TheBackboneproject • TheSpanishsubcorpus • Features • Ourapproach • Corpus compilation • Pedagogicenrichment • Corpus exploitation • Conclusion
1. Scant scholarly attention Source: Boulton (2010)
2. Different contexts different students with differentobjectives, abilities and needs
CLIL – Content and Language Integrated Learning Coyle (2007)
CLIL – Content and Language Integrated Learning Content Language Eurydice (2006:51)
Large, general corpora Novice users Photo: Kordite, Flickr
Homogeneous and systematic • Thematic relevance • Recontextualisation Authentication • Easy to use query tools and search options Braun (2006)
Pastinitiatives ELISA English Language Interview Corpus as a Second-Language Application • 2003 – 2004 • 25 video interviews in English • 5-15 minutes per interview • 60.000 words • Search interface • Learning materials • 2005 – 2008 • Video interviews in 7 EU languages - Teen talk • Corpus compilation and exploitation tools • Learning materials • Corpora + tools freely available
DIY approach – small spoken corpora Non-standard regional varieties Lesser taught languages Non-native varieties of ELF
CLIL settings: vocational training, secondary education, university.
Languageauthentication Small homogeneouscorpora Pedagogicallyrelevanttopics Full texts – sections – concordances Multimodality
Data-drivenlearningbut… … pedagogicallyselected, annotated and enriched data
Size 25 interviews 53000 words 300 minutes of video recordings
Regional varieties Speakersfrom 9 different provinces Northern and southern accents
Topics The environment Cultural issues World of work Social issues Urban and rural life Education Economic issues Science and research Government and Politics Healthcare and social security
MATERIALS DEVELOPMENT COLLABORATIVE ANNOTATION TRANSCRIPTION
Corpus compilation
Speaker selection Age range: 18 - 83 9 provinces Diverse professional fields ex-lawyer doctor archaeologist bio-farmer top researcher entrepreneur sportswoman confectioner teacher
Transcription Orthographic TEI-compliant markup <trunc> </trunc> <break/> <unclear> </unclear> <foreign> </foreign> <alternative> </alternative>
Backbone Transcriptor Transcribing and sectioning Video formats: DIVX, XVID,AVI,MPEG, Quick Time, RM, Audio formats: MP3, WAV, ASF Supports metadata information Timestamping audio-text
Step 1: Annotation Photo: J_O_I_D
Pedagogic annotation Unit-bound not text-bound Pérez-Paredes (2010)
Pedagogic annotation Teacher-driven & Learner-oriented Pérez-Paredes (2010)
Pedagogic annotation Corpus management: opening, closing, saving... Adding, deleting, opening and closing documents Corpus management: opening, closing, saving... Adding, deleting, opening and closing documents Collaborative functionalities Configuration options Help and tutorial Collaborative functionalities Configuration options Help and tutorial Customisable taxonomy tree Customisable taxonomy tree topics topics grammar grammar lexis lexis notions and functions: notions and functions: CEF level CEF level ... and whatever you need! ... and whatever you need! An annotated section Categories applied An annotated section Highlighted keywords Categories applied Highlighted keywords
Backbone Annotator TEI-compliant XML Drag & drop Edit options in XML Manages several corpora Integrated with Transcriptor and Search Tool
Corpus Management Tool BackboneSearchTool CORPORA OUTPUT Sánchez-Tornel et al. (Forthcoming)
Step 2: Materials development Photo:thewavingcat
Thematic relevance Two types of materials: - Learning modules - Corpus-based communicative and exploratory activities The Virtual Resource Pool Integration in Search Tool
Learning modules 19 modules – 107 activities Telos Language Partner 1 section – 1 module – several activities Comprehension & focus on form
Learning modules Sample module: Science and society
Learning modules Comprehension activities
Learning modules Multiple choice: comprehension