280 likes | 619 Views
11th ICAEA Forum: Rating for ICAO Language Proficiency Standards ENAC 6th-7th September 2011. Inter-rater Reliability. École nationale de l’aviation civile • The French Civil Aviation University. John Kennedy. www.enac.fr. Presentation Outline. Remote rating Recruitment and Training
E N D
11th ICAEA Forum: Rating for ICAO Language Proficiency Standards ENAC 6th-7th September 2011 Inter-rater Reliability École nationale de l’aviation civile • The French Civil Aviation University John Kennedy www.enac.fr
Presentation Outline Remote rating Recruitment and Training Measures of Inter-rater Reliability Inter-rater reliability: Data obtained Towards further harmonization
What is Remote Rating ? All tests are recorded Sound files are sent to remote raters Sound files evaluated the following week
Remote rating for the MTF_ALP The MTF (ENAC test approved by DGAC for ATCO’s) Interlocuter elicits a thirty-minute rateable sample: - Section 1: Picture / Photo - Section 2: Listening to Pilot messages - Section 3: Short text (prompt for discussion) Administrator distributes recorded sound files to: - two separate raters who rate independently - a third rater (in the case where raters disagree)
Advantages of Remote Rating Procedure • Voice only • Objectivity / Anonymity (Protection) • Reduction of test anxiety • Flexible work schedules • Continuous monitoring of interlocutor behaviour • Continuous monitoring of individual rater performances
Disadvantages of Remote Rating Procedure: Only one that we have experienced so far: Time required to produce results To be taken into consideration: Administration involved Cost of the procedure Loneliness of the rating job
Recruitment and Training of Team Started with team of eight raters (aviation English teachers) All experienced with language testing and ICAO scale Initial two-day training / harmonization session (April 2009) Regular refresher training (at least once a year) Ad hoc Feedback provided All raters rate regularly (4-8 tests per month) Initial Training of operational raters (October 2010) Candidates rated by one language teacher and one operational rater Commitment of individual raters to the project
Initial measures of Inter-rater Reliability * Before rater training **After rater training
Rater Training Initial rater training took place after completion of the development / trialling phases. Initial rater training was based on ICAO speech samples (2005 DVD) and the corresponding MTF benchmark samples.
Overall evaluation of rater performance Last year 29 out of 100 recordings were sent to a third rater for evaluation (who then gave the ‘correct’ level) In 29 out of 100 tests, one rater gave an ‘incorrect’ level
Language Teachers / Operational Raters Differences between Raters 2010/2011
Understanding and improving the Procedure X X (----------level 4-----------)(-----------level 5-----------) X X (----------level 4-----------)(-----------level 5-----------) In which case above do the two raters demonstrate a marked difference ?
Example 100 expert raters independently rate a candidate who is right on the boundary between levels 4 and 5. X (----------level 4-----------)(-----------level 5-----------) How many raters will give level 4 ? Level 5 ? What is the ‘correct’ level ?
Borderline Cases – Further Understanding It is in such cases that ‘disagreement’ is to be expected. Thanks to remote rating, the procedure is subject to continuous monitoring to ensure the validity of the final results. Where ‘disagreement’ exists, the candidate will have the benefit of three entirely independent and subjective evaluations.
Non-Borderline Cases Interrater reliability was checked for non-borderline cases in the MTF and found to be higher than 95% This was done by choosing candidates that appeared to be right in the middle of the respective bands X X X (-----------level 3-----------)(----------level 4-----------)(-----------level 5-----------) The corresponding sound files were then sent to all raters (acknowledgement: Sergey Melnichenko)
Towards further harmonization The ICAO Speech Samples Rater Training Project: use in the next refresher training session Contact with other test providers: cross-rating Further and more detailed analysis of borderline cases: defining the borders
Contacts john.kennedy@enac.fr scott.stroud@enac.fr michael.odonoghue@enac.fr Thank you!