450 likes | 466 Views
Developing annotation solutions for online data-driven learning. Pascual Pérez-Paredes and Jose María Alcaraz SACODEYL Universidad de Murcia, Spain. System Aided Compilation and Open Distribution of European Youth Language. 225836-CP-1-2005-1-ES-MINERVA-M.
E N D
Developing annotation solutions for online data-driven learning Pascual Pérez-Paredes and Jose María Alcaraz SACODEYL Universidad de Murcia, Spain EUROCALL 2007 - University of Ulster, 5 - 8 September
System Aided Compilation and Open Distribution of European Youth Language 225836-CP-1-2005-1-ES-MINERVA-M EUROCALL 2007 - University of Ulster, 5 - 8 September
Developing annotation solutions for online data-driven learning • Annotation in CL • Annotating corpora for the FL classroom • Challenges of pedagogical annotation • Developing annotation solutions • SACODEYL annotator Domain analysis Requirements and software specification EUROCALL 2007 - University of Ulster, 5 - 8 September
1. Annotation in Corpus Linguistics EUROCALL 2007 - University of Ulster, 5 - 8 September
Annotation in Corpus Linguistics • Add-on • Needs of the research community • Annotation = analysis • Annotation = processing EUROCALL 2007 - University of Ulster, 5 - 8 September
Why annotate? Annotation allows corpus users for both refined information retrieval capabilities and the subsequent treatment of the data EUROCALL 2007 - University of Ulster, 5 - 8 September
Annotation • Can be automatic, semi-automatic or manual • Can be performed by one or different annotators or software operators • Does reflect the different nature of the ultimate aim of the meta-information being added to the corpus EUROCALL 2007 - University of Ulster, 5 - 8 September
Non polysemic ambiguity: Poesio and Artstein (2005) ----------- Interest in L2 speakers’ errors: Abe and Tono (2005) EUROCALL 2007 - University of Ulster, 5 - 8 September
Strong research paradigm rooted ongrammatical tagging, including morphological and syntactical information (Garside, R., Leech, G., and McEnery 1997). EUROCALL 2007 - University of Ulster, 5 - 8 September
2 Annotating corpora for the FL classroom2.1 Corpora in the FL classroom EUROCALL 2007 - University of Ulster, 5 - 8 September
Interest in corpora and FLT: • Volumes: Sinclair 2004, Braun, Kohn and Mukherkee 2006, Hidalgo, Quereda and Santana 2007 • SIG EUROCALL • 1st International Conference on Corpus-Based Approaches to ELT , November 2007 EUROCALL 2007 - University of Ulster, 5 - 8 September
Normalisation is still an issue: • Mauranen (2004:99) points out that for a teaching method to become an important innovation, it has to “make its way to the normal classroom where teachers and students can use it as part of their everyday routine, with not too much extra hassle”. • Chambers 2007: major obstacles • Braun 2007: secondary education EUROCALL 2007 - University of Ulster, 5 - 8 September
2 Annotating corpora for the FL classroom 2.2 Annotating with a view on learning EUROCALL 2007 - University of Ulster, 5 - 8 September
Braun (2007): pedagogically motivated corpora (a) provide a more systematic range of material than individual texts or scattered collections of activities and, if well-designed, (b) offer a wider range of idiolects than the average material. EUROCALL 2007 - University of Ulster, 5 - 8 September
Braun (2006) states that thematic annotation, including topic keys and section titles, are particularly useful in the implementation of pedagogically motivated corpora. EUROCALL 2007 - University of Ulster, 5 - 8 September
The annotators have a pedagogical use of the text in mind when approaching the annotation stage. • The tags <topic_title>, <topic_key> and <content_key> highlight the relevance of the communicative purpose of texts, that is, the topics and the contents that characterize them. EUROCALL 2007 - University of Ulster, 5 - 8 September
3 Annotation challenges EUROCALL 2007 - University of Ulster, 5 - 8 September
Rememberthe why annotate? slide Annotation allows corpus users for both refined information retrieval capabilities and the subsequent treatment of the data PEDAGOGY EUROCALL 2007 - University of Ulster, 5 - 8 September
Linguistic analysis of interest in FLT • Tsui (2004) • Corpus-based studies focus on 4 areas of description: • Lexical collocation • Syntactic patterning • Genre analysis • Discourse structure and cohesion Word based and relying on co-occurrence of grammatical word-class tags EUROCALL 2007 - University of Ulster, 5 - 8 September
Linguistic analysis of interest in FLT------>Linguistics comes first------->DDL materialsConcordances and corpus Researcher/Linguist End user EUROCALL 2007 - University of Ulster, 5 - 8 September
Pedagogical analysis (and annotation) of language corpora------>Pedagogy comes first------->Pedagogy-driven DDL Material developer/Teacher/ Learner End user EUROCALL 2007 - University of Ulster, 5 - 8 September
CHALLENGES • Problem-oriented tagging • Corpus applications in FLT still need to gain a status on their own EUROCALL 2007 - University of Ulster, 5 - 8 September
CHALLENGES DESIGN TECHNOLOGY EPISTEMOLOGY EUROCALL 2007 - University of Ulster, 5 - 8 September
DESIGN Leech (1993) maxims • remove the annotation from the text; • if desired, the annotation could be extracted • based on guidelines everyone could reach; • it should be made clear how and by whom the annotation was carried out, • it should be based on widely agreed and theory-neutral principles EUROCALL 2007 - University of Ulster, 5 - 8 September
EPISTEMOLOGY • Presuppositions and foundations: antecedent implications in the literature • Annotation oriented towards pedagogical uses EUROCALL 2007 - University of Ulster, 5 - 8 September
EPISTEMOLOGY • Mukherjee (2006): copora in language pegagogy for (a) dictionaries and material, (b) database and (c) representative samples of learner language. EUROCALL 2007 - University of Ulster, 5 - 8 September
EPISTEMOLOGY • Meunier (2002): methodological influence ---- use of classroom concordancing and inductive approach to learning leading to “rehabilitation” of grammar (p. 135) EUROCALL 2007 - University of Ulster, 5 - 8 September
EPISTEMOLOGY • Bernardini (2000): inductive and deductive learning, probabilistic notion of language and learning pedagogy that resolves the attention to form /meaning dichotomy EUROCALL 2007 - University of Ulster, 5 - 8 September
EPISTEMOLOGY • Bernardini (2000): learners as either researchers or travellers EUROCALL 2007 - University of Ulster, 5 - 8 September
EPISTEMOLOGY • Bernardini (2004): potential of corpora as a linguistic aid: favour descriptive insights and discovery learning EUROCALL 2007 - University of Ulster, 5 - 8 September
EPISTEMOLOGY • Pérez-Paredes (2003,2004): integrative paradigm of CL in FLT EUROCALL 2007 - University of Ulster, 5 - 8 September
TECHNOLOGY • User-friendly: non-computational linguists • Multilingual support • Standard-compliant: reusability and valorisation EUROCALL 2007 - University of Ulster, 5 - 8 September
4. Developing Annotation Solutions EUROCALL 2007 - University of Ulster, 5 - 8 September
Developing Annotation Solutions From Challenges To Requirements From software engineering perspective, development can be considered as the following process: From Requirements To Solutions EUROCALL 2007 - University of Ulster, 5 - 8 September
Input Requirements • Input = User Requirement • Changing Approach = Changing Requirements • Identifying New Requirement • Five Perspectives EUROCALL 2007 - University of Ulster, 5 - 8 September
Actors & Context. Linguistic Engineering vs Pedagogical Engineering Teaching • Pedagogic Tool • Learning Oriented • Friendly • General Domain • Practical • Simplicity • Organizational • Optional Researching • Powerful Tool • Research Oriented • Extensible & Modular • Specific Domain • Efficient • Complexity • Ad-Hoc Solutions • Mandatory EUROCALL 2007 - University of Ulster, 5 - 8 September
Data. Grammatical vs Pedagogical Linguistic Engineering • Large amount of data (representative Corpora) • Grammatical Annotation • Oriented to retrieve statistical Information Learning • Reduced set of data • Pedagogy Annotation • Oriented to retrieve learning information (Hierarchical Structures & Selective Information) EUROCALL 2007 - University of Ulster, 5 - 8 September
Epistemological & Empirical • Multi-Disciplinarily support • Multi-Lingual support • Multi-Corpus Management • Multi-Purpose Support • Based on Standards EUROCALL 2007 - University of Ulster, 5 - 8 September
Choosing Software Life Cycle Spiral Approach Why? EUROCALL 2007 - University of Ulster, 5 - 8 September
5 SACODEYL Annotator EUROCALL 2007 - University of Ulster, 5 - 8 September
Output. SACODEYL Annotator SACODEYL Annotator characteristics: • Pedagogical Motivation • Teaching Oriented • Friendly Interface • Multi-Language (UTF) • Standardization (TEI) • Multi-Purpose EUROCALL 2007 - University of Ulster, 5 - 8 September
Developing annotation solutions for online data-driven learning Contact information Pascual Pérez-Paredes pascualf@um.es Jose María Alcaraz jmalcaraz@gmail.com Universidad de Murcia, Spain EUROCALL 2007 - University of Ulster, 5 - 8 September