1 / 33

Exploring Untrained Raters' Perceptions on L2 Fluency: A Comparative Study Between Trained and Untrained Raters

This research investigates the rating patterns and perceptions of untrained raters on L2 fluency, comparing their performance with trained raters. The study delves into assessment criteria, visual and audio impacts on ratings, and the influence of Mandarin variety. Research methodologies, findings, and implications are discussed.

vannest
Download Presentation

Exploring Untrained Raters' Perceptions on L2 Fluency: A Comparative Study Between Trained and Untrained Raters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perceptions on L2 fluency-perspectives of untrained raters by The Foreign Language Assessment Group CFP meeting 06 March 2009

  2. Classroom recording (video + audio) Annotation of Speaker Turn Transcription Random Selection Extracted Audio files Automated Phone Segmentation Manual Checking Data Analysis Human Raters’ Perceptions Data Analysis Data Analysis

  3. Status of the two rating studies • Summer & Fall 2008 • 1st Rating study • 38 untrained raters from different Mandarin-speaking regions • Data collection completed • Data analysis currently underway • Spring 2009 • 2nd Rating study • Target 10-20 participants (i.e. trained raters) from linguistic-related disciplines • Determine factors that affect raters’ perception on L2 fluency, visual impacts on rating performance and influence of variety of Mandarin

  4. Research Purpose • To explore untrained raters’ rating patterns and their perceptions on L2 fluency • To compare raters’ performance between untrained and trained raters

  5. Research Questions • What kinds of rating patterns do untrained raters show? • Which assessment criteria predict the L2 fluency? • Are there any interaction effects between visual and audio inputs based on the rating results by untrained raters? • What are implications for the automated speech recognition tool from these results?

  6. Research Procedures (1) • Target raters • 38 Native Speakers of Chinese • “Untrained” people in rating • Mini-Training session • Familiarization: 1.5 Hours Workshop • Brief directions on rating scales descriptors /the rating procedures

  7. Research Procedures (2) Session: First Rating Second Rating →Two Weeks → Type: Audio Video Audio Video R1- R19 R20-R38 R20-R38 R1- R19 120 120 120 120 120 120 120 120

  8. Rater Procedures • 38 Native Speakers of Chinese • 6 or 7 assessment criteria used depending on method types • Web-rating tool used • Note1:e.g. Disfluency, Pronunciation, Nativeness, Communication, Syntax, Lexicon, Gesture • Note 2:e.g. Audio/ video

  9. Web-Rating Frame used

  10. Methodology • Target Raters • Rating results of 33 untrained raters were analyzed • Data Analyses • Descriptive Statistics, Correlation analysis • Analysis of Repeated Measures to look at the audio/visual interaction effects • Logistic Regression

  11. Q1.What kinds of rating patterns do untrained raters show?

  12. Severity level (Mean of ratings)

  13. Correlation Analysis (1)

  14. Correlation Analysis (2)

  15. Q2.Which assessment criteria can predict fluency(Logistic_1)?

  16. Q2.Which assessment criteria can predict the fluency(1_2)? • Group 1(S2): Y=-11.358+ 1.515X1+0.301X2+0.859X3+0.547X4 +0.643X5+0.449X6 + 0.102X7 • Group 2(S1): Y=-12.620+1.457X1+0.731X2+0.643X3+0.738X4 +0.598X5+0.855X6+ -0.070X7

  17. Q2.Which assessment criteria can predict the fluency(Logistic_1)?

  18. Q2.Which assessment criteria can predict the fluency(2_2)? • Group 1(S1): Y=-10.188+1.290X1+0.389X2+0.883X3+0.405X4 +0.597X5+0.460X6 • Group 2(S2): Y=-12.606+1.806X1+1.610X2+0.050X3+0.649X4 +0.308X5+0.651X6

  19. Q3. Are there any interaction effects between visual and audio inputs based on the rating results by untrained raters?

  20. Findings (1) • Gestures show relatively low correlation with Fluency in both rating sessions. • Gestures and Pronunciation are variables that do not predict the fluency level in G1 (video samples). • Nativeness and syntax do not predict the fluency level in G2 (audio samples).

  21. Findings (2) • Interaction effects are significant. • Implies that raters show different rating patterns depending on the rating session and input types (audio/video).

  22. 2nd Rating Study of Trained Raters • Comparisons with rating results of 1st untrained raters • Find differences in ratings between two different groups

  23. Methodology of 2nd rating study • Rating scale • 6-7 Assessment criteria used in the 1st study • Same speech samples used in the 1st study • Visual/audio effects on rating • Same input types used in the 1st study • Same rating procedures used in the 1ST study • Individual raters’ rating patterns

  24. Raters • Target raters • 10 - 20 Native Speakers of Chinese • Teaching Experience at UIUC and other area • Propose Rater training model for trained raters

  25. Rater Training Model STEP 1 STEP 2 STEP 3 Training Workshop Practice & Discussion Session II Practice & Discussion Session I Actual Ratings STEP 4

  26. Rater Training • Training materials • Using the same rating scale descriptors and rating procedures as used in the 1st study • One day Workshop (3-4 hours) • More practice during on-site workshops • Familiarization and norming sessions by providing lecture, practices, and discussions

  27. Methodology in the 2nd Study • For trained raters • Repeated measures to look at the audio/visual interaction effect • Logistic Regression analysis • Correlation analysis • For comparisons with two groups • T-test for two group mean differences in terms of assessment criteria • FACETS analysis

  28. Validity is the underlying objective – to validate the measures developed in this project. • Messick, 1989: Validity is a unitary concept which includes test use and consequences “Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment”. • Kane, 2006 : Validation is a concept which understands the procedures to connect test scores/score-based inferences to test use/ the consequences of test use. “To validate an interpretation or use of measurements is to evaluate the rationale, or argument, for the claims being made, and this in turn requires a clear statement of the proposed interpretations and uses and a critical evaluation of these interpretations and uses”

  29. Given the above definitions of validity: • Further refinement of the rating scale may be suggested based on:Empirical evidence to support changes made to therating scale • New rating tool for untrained raters • Wording of descriptors e.g. Phonological Control • 7 Assessment criteria used in the study • Propose a training model for trained raters group

  30. Thank you for your time!!

  31. CHINESE FLUENCY PROJECT–RATING RUBRICS Version 0.43

  32. Correlation Analysis (Audio 1)

  33. Correlation Analysis (Audio 2)

More Related