How could training of language examiners be related to the C ommon E uropean F ramework ?

Inaugural Conference of EALTA Kranjska Gora, Slovenia May 14th-16th 2004 How could training of language examiners be related to the Common European Framework? A case study based on the experience of the Hungarian Examinations Reform Teacher Support Project of the British Council Ildikó Csépes University of Debrecen, Hungary

Assessing speaking skills  subjective assessment • adopting standard procedures governing how the assessments should be carried out (Guideline 1) • basing judgements in direct tests on specific definedcriteria (Guideline 2) • using pooled judgements to rate performances (Guideline 3) • undertaking appropriate training in relation to assessment guidelines (Guideline 4) The Common European Framework of Reference for Languages (2001, p.188): Subjectivity in assessment can be reduced, and validity and reliability thus increased by taking steps such as

There is an increased interest in QUALITY CONTROL  The training of language examiners has become an important issuein English language education in Europe.

In this presentation, some important aspects of quality control will be highlighted in relation to training oral examiners: • the use of Interlocutor Frame to conduct the speaking exam • the role of benchmarking in assessor training

A set of suggested training procedures for oral examiner training will also be presented. The model speaking examination and the interlocutor/assessor training model have been developed and piloted by the Hungarian Examinations Reform Teacher Support Project of the British Council. The original aim of the Project was to develop a new English school-leaving examination. Now there is only a model exam and related training courses available. The training modelcanbeeasilyadapted to other contexts.

According to the CEF (Guideline 1), standard procedures should be adopted to carry out the assessments. a way of standardising the elicitation of oral performances  by using an Interlocutor Frame (it helps to conduct the exam in a standard manner, following a standard procedure)

The Interlocutor Frame developed by the Project • describes in detail how the exam should be conducted • gives standardised wording for • beginning the examination • giving instructions • providing transition from one part of the examination to the other • intervening • rounding off the examination

Overview of the Model Speaking Examination

Part 1: focuses on candidates’ general interactional skills and ability to use English for social purposes Part 2: candidates demonstrate their ability to produce transactional long turns by comparing and contrasting visual prompts and to answer scripted supplementary questions asked by the interlocutor In Part 1 and 2, the interlocutor’s contributions (questions and instructions) arecarefully guided and described in as much detail as possible in the Interlocutor Frame.

Part 3: candidates produce both transactional and interactional short turns • The interlocutor and the candidate interact with each other in order to reach a decision about a problem that is posed by the interlocutor. • The candidate has a small number of prompts to work with while the interlocutor has specific guidelines for contributing to the exchange. In Part 3, the interlocutor’s contributions are also carefully guided but the interlocutor has more freedom to express him or herself when participating in the simulated discussion task.

According to the CEF (Guideline 2 ), assessors’ judgements should be based on specific defined criteria. Performances are rated by the assessor according to set criteria, which consist of • communicative impact • grammar and coherence • vocabulary • sound, stress and intonation

The Analytic Rating Scale It consists of 8 bands: • 5 of these bands (0, 1, 3, 5, 7) are defined by band descriptors • 3 of them (2, 4, 6) are empty bands, which are provided for evaluating performances which are better than the level below, but worse than the level above

According to the CEF (Guideline 3), pooled judgementsshould be used to rate performances. Pooled judgements are represented as benchmarks for sample performances. Benchmarked performances can enhance and ensure the reliability of subjective marking. Benchmarked performances can illustrate band descriptors (different levels of achievement). Without benchmarked performances, assessors may interpret candidates’ performances in their own terms.

The benchmarking procedures were designedby Charles Alderson (the advisor of the Project). In the Hungarian context theyconsisted of four main phases: • selecting sample performances and judges • home marking by judges • live benchmarking • editing and standardising justifications

Phase 1: Selecting Sample Performances and Judges The Assessor-Interlocutor Training Team members selected a wide range of oral performance samples (12) that had been videoed during pilot examinations. These were subsequently used for the benchmarking exercise. 15 experts wereinvited, who were thought to have particular expertise in and experience of the assessment of oral performances in English, both at secondary and at tertiary level, and who were expected to have some familiarity with the CEF.

Phase 2: Home Marking Judges were asked to • study the documents of the Benchmarking Pack carefully. • view the videoed performances on tape and mark them according to the appropriate rating scale and use the mark sheets provided. • view the videos again once all performances have been marked, and make any necessary adjustments to the marks.

Phase 2: Home Marking Judges were asked to • note down any features of each performance that justified the mark for each criterion, always referring to the band descriptors in the scale. • make a list of examples of candidate language, which would contribute to the final list to be compiled after the benchmarking exercise and to be used for training assessors in the future.

Mark sheets and notes were sent to Györgyi Együd, the coordinator of the benchmarking exercise in an electronic format  • she collated all the marks and notes for each performance sample and assigned an ID number to each judge. • For each candidate  a table of resultsby criterion and judge and a table of justifications

Phase 3: The Live Benchmarking STEP 1:Judges viewed and marked each video again without the notes they had made previously. However, they were encouraged to take notes, underline relevant aspects of the scales that led them to their decisions. STEP 2:Judges were asked to reveal their marks after each video sample.

Phase 3: The live benchmarking STEP 3:Judges looked at the table of marks given in the preparation phase together with the collated justifications – in the meantime the marks were being recorded for purposes of calculating first and second marks (intra-rater reliability). STEP 4:The candidate’s performance was then discussed with reference to the justifications and the current rating session.

Phase 3: The live benchmarking STEP 5:Judges voted for the final benchmarks. STEP 6:The individual mark sheets were handed in for central recording after the performance sample had been benchmarked. STEP 7:Judges discussed major and minor errors in relation to the benchmarked performance.

The main purpose of the benchmarking workshop: to reach agreement on grades using the Project’s scales. • Relating the performances to the Common European Framework could only be a supplementary exercise. •  • For this purpose the 9-point scale (Overall Spoken Interaction) on page 74 of the Framework was used. After each video sample, judges had to indicate which of the 9 levels best described the candidate.

Phase 4: Editing and Standardising Justifications • Reasons for editing and standardising the justifications: • thejustifications had to be worded in harmony with the wording of the Speaking Assessment Scales as much as possible in order to make the assessor training more effective • participants seemed to be more ready to accept the benchmarks when they saw that the justifications used the same terms (printed in bold) as the band descriptors in the scales.

Phase 4: Editing and Standardising Justifications • theexamples for minor and major mistakes, included in the justifications for support and illustration, had to be selected from the list of examples for candidate language that had been agreed on by all the judges. • thejustifications or notes produced by the individual expert judges in the home marking phase were rather varied with respect to both content and format and so they had to be collated and standardised in terms of layout in order to produce the final justifications for each candidate.

The Use of Benchmarked Performances in the Training of Assessors The benchmarks and justifications produced by the judges in the benchmarking sessions are used for supporting the pre-course tasks and the face-to-face assessor training course.  Benchmarked performance samples illustrate candidate performance at different levels of the scales.

The Use of Benchmarked Performances in the Training of Assessors When the wording of the assessment scales contains expressions such as ‘major and minor mistakes’, or ‘wideand limited range of vocabulary’, only benchmarked performance samples on video together with standardised, written justifications can help future assessors to come to an agreement about what level of performance the band descriptors actually refer to.

In the face-to-face training phase,thebenchmarks and justifications are revealed to course participantsin different waysat different stages of the training.

Stage 1 Step 1 Individual assessor’s decision Step 2 Justifications Step 3 Benchmarks

Stage 2 Step 1 Individual assessor’s decision Step 2 Justifications Group decision Step 3 Step 4 Benchmarks

Stage 3 Step 1 Individual assessor’s decision Group decision (revealed) Step 2 Step 3 Benchmarks Justifications Step 4

Stage 4 Individual assessor’s decision (revealed) + taking notes Step 1 Step 2 Benchmarks Groups write justifications Step 3 Step 4 Justifications

A Colour-coded Overview of the Techniques

According to the CEF (Guideline 4), future oral examiners should undertake appropriate training. The training procedures developed by the Project have the following aims: • to provide participants with sufficient information about the model speaking examination they are going to be trained for (outline, task types, mode) • to familiarise participants with standard interlocutor behaviour • to familiarise participants with the main principles and procedures of assessing speaking performances

Further aims: • to introduce the idea and practice of usinganalytic rating scales for assessing oral performances • to enable participants to develop the necessary interlocuting and assessing skills • to ensure valid and reliable assessment of live performances through standardisation • to equip trainees with transferable skills(there is a special need for this in Hungary)

The Outline of the Training Model Stage 1:pre-course distance learning • self-study of an Introductory Training Packwith a pre-course video • accomplishing the pre-course tasks (analysing and marking sample video performances)

The Introductory Training Pack contains • An overview of the speaking examination • Guidelines for interlocutor behaviour • Guidelines for assessor behaviour • Pre-course tasks • Self-assessment questions • Appendices (e.g. Benchmarks & Justifications for the Sample Speaking Tests, Examples of Candidate Language, CEF Scales, Glossary)

The Outline of the Training Model • Stage 2A: live interlocutor training course (a series of workshop sessions – Day 1) • discussing the experiences of the distance phase • analysing video samples of both standard and non-standard interlocutor behaviour • standardisation of the administration procedure through simulated examination situations (role plays)

The Outline of the Training Model • Stage 2B: live assessor training course (a series of workshop sessions – Day 2) • discussing the experiences of the distance phase • introduction to assessing oral performances: modes and techniques of assessment • familiarisation with the analytic rating scale • standardisation of the assessment procedure • comparing performances at different levels

The Outline of the Training Model Stage 3:a distance phase Practicalapplication of the acquired skills in mock speaking tests  • Participants do the mock examsin co-operationwith another course participant, thus they take the role of both the interlocutor and the assessor. • They can observe each other and share their experiences. • They have to report on their experience in detail.

Sample Materials from the Interlocutor Training Model Sample 1: Analysing non-standard interlocutor behaviour After seeing and discussing standard interlocutor behaviour, participants are asked to compare it with non-standard performances. • They have to identifyinstances where the interlocutor’s behaviour deviates from the Interlocutor Frame and the suggested guidelines.

Sample Materials from the Interlocutor Training Model Sample 2: Simulating difficult examination situations Participants role play difficult examination situations in groups of three: • an observer • the candidate • the interlocutor For each part of the model speaking exam, there are three role-play tasks → all participants will experience all the three roles by the end of the training.

Role-play Cards for Part 1 (The Interview) Candidate You are a shy, not very talkative candidate who tends to wait for guiding questions. You often reply with one or two short sentences only. Interlocutor You are the interlocutor who asks the questions of the first part of the speaking test. You have to elicit as much speech from the candidate as possible. Please remember to ask the questions listed in the Interlocutor Frame.

Conclusions • It is impossible to become a trained interlocutor and assessor without formal training. • Training should involve distance and face-to-face elements as well to ensure that future interlocutors and assessors go through each and every phase of the difficult and complex standardisation process. • One training course is not enough. • Only further practiceandmonitoring interlocutor and assessor behaviourcan ensure that candidates’ speaking ability is assessed in a standard manner and the assessments are valid and reliable.

INTO EUROPE Series Editor: J. Charles Alderson The Speaking Handbook Ildikó Csépes & Györgyi Együd The Handbook is accompanied by a 5-hour DVD Published by Teleki László Foundation & The British Council Distributor: Libro Trade Info:books@librotrade.hu email: icsepes@delfin.unideb.hu

How could training of language examiners be related to the C ommon E uropean F ramework ?

How could training of language examiners be related to the C ommon E uropean F ramework ?

Presentation Transcript

Training teachers to use the European Language Portfolio

Training teachers to use the European Language Portfolio

Training the OSCE Examiners

Tho Common European Framework and the European Language Portfolio: Developing FL teaching in Europe as Language Educ

Training teachers to use the European Language Portfolio

Training teachers to use the European Language Portfolio

Working with the Common European Framework

Training teachers to use the European Language Portfolio

Manual for Relating Examinations to the Common European Framework

Training teachers to use the European Language Portfolio

THE COMMON EUROPEAN FRAMEWORK – THE VOCABULARY DIMENSION

Training teachers to use the European Language Portfolio

How the English language came to be.

How could these be related to each other?

Common European Framework of Reference ( CEFR ) A transparent language

Training teachers to use the European Language Portfolio

From the Common European Framework of Reference to the European Language Portfolio

Using the Common European Framework of Reference to Report Language Test Scores

Training teachers to use the European Language Portfolio

Common European Framework of References for Languages:

The Common European Framework of Reference: a Developing Standard

The Common European Framework and the European Language Portfolio: