Developing Language Proficiency Assessments: Challenges and Solutions

Developing Valid, Reliable, and Appropriate Assessments for Language Proficiency Dorry M. Kenyon Director, Language Testing Division

Developing Valid, Reliable, and Appropriate AssessmentsforLanguage Proficiency Dorry M. Kenyon Director, Language Testing Division

First, A Plug National Research Council. (2002). Performance assessments for adult education: Exploring the measurement issues, Report of a workshop. Committee for the Workshop on Alternatives for Assessing Adult Education and Literacy Programs, Robert J. Mislevy and Kaeli T. Knowles, Editors.Washington, DC: National Academy Press. http://www.nap.edu/catalog/10366.html

Issues/Challenges • 1. What type of assessment is required by the NRS: proficiency or achievement? • 2. What type of assessment will be appropriate for the NRS? • 3. Can there be such a thing as a valid test for the NRS? • 4. Can performance assessments really be reliable?

Issue 1 • What type of language assessment is required by the NRS: proficiency or achievement?

Testing for Achievement • Goal: To assess whether students have “learned what they have been taught”

Testing for Achievement • “Achievement testing has a simple, single purpose: to measure what a student has learned.” • Filling In the Blanks: Putting Standardized Tests to the Test by Gregory J. Cizek, Fordham Report, Vol. 2, No. 11, October 1998, p. 1

Testing for Proficiency • Goal: To assess what a learner “can do;” performance in “real-life” situations

Testing for (Language) Proficiency ...“language performance in terms of the ability to use the language effectively and appropriately in real-life situations” Buck, K., Byrnes, H., & Thompson, I. (Eds.) (1989). The ACTFL oral proficiency interview tester training manual. Yonkers, NY: ACTFL, p.1.1 • NOTE: original language proficiency movement stressed that proficiency assessment was without regard to how the language was learned/acquired

NRS Descriptors • Define “educational functioning levels” (NRS, 2001)

Sample NRS Descriptors • Beginning ESL/Speaking and Listening • Individual can understand frequently used words in context and very simple phrases spoken slowly with some repetition • High Intermediate ESL/Speaking and Listening • Can communicate basic survival needs with some help

Sample NRS Descriptors • High Advanced ESL/Speaking and Listening • Individual can understand and participate effectively in face-to-face conversations on everyday subjects spoken at normal speed

Challenges • What is the relationship between curriculum, classroom teaching, and proficiency (versus achievement) outcomes?

Challenges • What role do prior factors outside the control of the instructional program play in development of proficiency? • Personality characteristics? • Prior educational and life experiences? • Opportunities for language acquisition outside the classroom?

Issue 2 • What type of assessment will be appropriate for the NRS?

Language Testing Model • Bachman, L.F. & Palmer, A.S. (1996). Language testing in practice. Oxford: Oxford University Press Language Use in Test Performance “Real World” Language Use

National Research Council Report • “…performance assessments…generally require test takers to demonstrate their skills and knowledge in a manner that closely resembles a real-life situation or setting…” p. 7

Challenges • To what extent can proficiency (“educational functioning levels”) be adequately assessed without some demonstration of performance (i.e., through performance assessment versus multiple-choice items)? • To what extent do factors outside the control of educational programs influence outcomes in such performance assessments?

Challenges • Performance assessments are not easy to develop, administer, score, and validate.

Underlying competencies Rater Scale/criteria qualities and scoring conditions Score/Measure Scale/Criteria Performance Underlying competencies Task qualities and conditions Task Administrator Underlying K/S/A Student

(References) • Kenyon, D.M. (1992). Introductory remarks at symposium on Development and use of rating scales in language testing, 14th Language Testing Research Colloquium, Vancouver, February 27-March 1. • McNamara, T. (1996). Measuring second language performance. London: Longman. • Skehan, P. (1998). A cognitive approach to language learning. Oxford: Oxford University Press.

Issue 3 • Can there be such a thing as a valid test for the NRS?

Definition of Validity (Messick) • “Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (italics original) • Messick, S. (1989). Validity. In R. Linn, (Ed), Educational Measurement, 3rd Ed. New York: Macmillan Publishing Company, pp. 11-103, p. 11.

Definition of Validity (Standards) • “…the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” • American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC. Author, p. 9. (italics added)

Implication • Any assessment used for NRS purposes will be valid only to the extent that it can be shown that inferences about the learner made on the basis of test scores have theoretical rationales and empirical evidence that the relate directly to the NRS level descriptors

Challenges • Is primary use reporting for the NRS? • What about uses that are important to programs: diagnostic? Improve curriculum? Achievement? • How to accumulate evidences to support the validity of the assessment? (long, on-going process)

Issue 4 • Can performance assessments really be reliable?

Definition of Reliability • “...the consistency of…measurement when the testing procedure is repeated on a population of individuals or groups” • American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC. Author, p. 25. (italics added)

Potential Sources of Inconsistency • Tasks • Administrators/Administration Procedures • Forms • Occasions • Evaluators/Raters

Underlying competencies Rater Scale/criteria qualities and scoring conditions Score/Measure Scale/Criteria Performance Underlying competencies Task qualities and conditions Task Administrator Underlying K/S/A Student

Documenting Reliability • “Potential” reliability (demonstration by test developers) • HOWEVER: (AERA/APA/NCME Standards, pp. 30-31) Typically, developers and distributors of tests have primary responsibility for obtaining and reporting evidence of reliability … In some instances, however, local users of a test or procedures must accept at least partial responsibility for documenting the precision of measurement. This obligation holds when … users must rely on local scorers who are trained to use the scoring rubrics provided by the test developer. In such settings, local factors may materially affect the magnitude of error variance and observed score variance. Therefore, the reliability of scores may differ appreciably from that reported by the developer.

Implication • Assessment implementation and maintenance becomes a larger concern with performance-based assessments

Summary (1/2) • Challenges (NRC Report) • Are there resources for • development? • staff training? • implementation and maintenance? • Is there time for • assessment? • learning opportunities?

Summary (2/2) • What do we lose if we don’t work together to face challenges? • Meaningfulness of outcome criteria (e.g., NRS descriptors) • Valid alignment with desired outcomes (e.g., assessing oral language skills of adult ESL learners) • Improvement of instruction to make progress towards outcomes

Questions and Discussion

--Thank you

Developing Language Proficiency Assessments: Challenges and Solutions

Developing Language Proficiency Assessments: Challenges and Solutions

Presentation Transcript

As Testing Director…

Tom Rawlings Division Director

Automated language testing

Language Testing Introduction

Language testing

Catherine Elder Director, Language Testing Research Centre caelder@unimelb.au

Language Testing: TESTING ORAL ABILITY

Marco M. ALEMAN Acting Director, Patent Law Division, WIPO

Dr Susan Kenyon

University of Michigan English Language Institute Testing and Certification Division

Language Testing

Language Testing

Kenyon College Endowment

LANGUAGE PROFICIENCY TESTING

Marco M. ALEMAN Acting Director, Patent Law Division, WIPO

Marco M. ALEMAN Acting Director, Patents Law Division, WIPO

Language Division and Assessment

TheBloodCovenant-Kenyon

Division Director of Testing Regional Meeting Spring 2019