The use of the Test Standards with evolving program uses

The use of the Test Standards with evolving program uses Presented at the annual meeting for the National Council on Measurement in Education April 29, 2017 Andrew Wiley, PhD

TEST STANDARDS Standard 1.1 The test developer set forth clearly how test scores are intended to be interpreted and consequently used. Standard 1.2 The rationale should be presented for each intended interpretation of test scores for a given use. Standard 1.3 If validity for some common or likely interpretation for a given use has not been evaluate, …, that fact should be made clear …

Test Standards – 5 sources of validity evidence • Evidence based on test content • Evidence based on response process • Evidence based on internal structure • Evidence based on relations to other variables • Evidence for validity and consequences of testing

So let’s think about different types of educational assessment programs • K-12 statewide assessment programs • Grades 3 to 8 • Grades 10, 11, EOC • Admissions testing • Classroom based assessments • Formative • Interim

K12 statewide assessment program • Developmental processes • Who was involved? • What steps/policies did they follow? • What review procedures were in place? • Response process • How did they ensure that test takers were answering as expected • Consequences • How did the introduction of the assessment program impact students progression across grade levels?

ADMISSIONS Assessment • Relations with other variables • What is the relationship between test scores and eventual performance in the school/program of interest? • Fairness • Do test scores predict accurately and consistently across critical demographic groups (e.g. race/ethnicity, gender, language)? • Development processes • Does the content of the assessment reflect critical KSAs required for success in the program of interest? • Consequences • How does the use of the assessment for admissions impact the overall admissions process, including who applies and who is accepted

CLASSROOM BASED ASSESSMENT • Developmental procedures • What steps/procedures were followed when determining the KSAs that would be assessed? • Relations with other variables • What type of alignment evidence is available and what procedures were followed during this work? • What is the relationship between test scores of this assessment and other high-stakes assessment (e.g. statewide, admissions, etc.)? • Consequences • How has the use of the program impacted other aspects of the educational program (i.e. time available for teaching, classroom activities, impact on curriculum)

But then things get a little messy • Use of admissions test for K12 accountability • Diagnostic feedback from a K12 assessment • Students • Teachers • Schools • Use of classroom assessment • promotion to next grade • Teacher evaluations • School funding

Building a comprehensive validity argument • I think we have to acknowledge the fact that for most testing programs, the process of gathering evidence to support their program is completed in an environment where resources are limited and budgets do not allow programs to complete every study that would be appropriate. • Can we figure out better ways to identify what components should be considered essential, which components could wait a little while, and perhaps even identify components that can reasonably be postponed for significant periods of time.

Let’s highlight some specific standards

Evolving from classroom to higher stakes • Test and item development procedures • Administration practices • Fairness • Validation

Test and item development procedures Standard 4.0 Test developers should document steps taken during the design and development process to provide evidence of fairness, reliability, and validity for intended uses … Standard 4.9 When items or test form tryouts are conducted, the procedures used for selecting the sample(s) of test takers as well as the resulting characteristics of the sample(s) should be documented.

Test and item development procedures As stakes begin to rise • Revamp/replace the entire item pool • Costly / Lost resources (items) • Time consuming, long time to implement • Link to historical test forms • Gradual revision following best practices • Uneven performance on items/test forms • Timeline for “reaching goal” is tenuous

Test and item development procedures As stakes begin to rise Some other options to consider • Retroactively conduct independent reviews of items • Focus groups to evaluate how students read/react to items

Test administration Standard 6.0 …Assessment instruments should have established procedures for test administration scoring, reporting, and interpretation. Those responsible for administering, scoring, reporting, and interpreting should have sufficient training and supports to help them follow the established procedures.

Test administration Some options to consider • Introduction of new standardized procedures • Loss of flexibility • Resource requirements for test administration • Comparability between old and new test administrations • Customer compliance with new/updated administration protocols

Fairness in testing Standard 3.3 Those responsible for test development should include relevant subgroups in validity, reliability/precision and other preliminary studies used when constructing the test. Standard 3.5 Test developers should specify and document provisions that have been made to test administration and scoring procedures to remove construct-irrelevant barriers for all relevant subgroups in the test-taker population.

Fairness in testing Some options to consider • Changing of performance standards moving forward • Continuity between old and new content • Use of the “old content” while the new content is being developed • Length of time to create new content • Retroactive committee review of the current item pool • Loss of items/test forms • Can be slightly less time/consuming for the initial phase

validation Standard 1.0 Clear articulation of each intended test score interpretation for a specified use should be set forth, and appropriate validity evidence in support of each intended interpretation should be provided.

Validation – Employment testing Predictor Criterion measure Measure Predictor 3 Criterion construct construct domain domain 1 5 4 2

Some other possible changes in use • Use of admissions test for K12 accountability • Diagnostic feedback from a K12 assessment

Validation – Employment testing Predictor Criterion measure Measure Predictor construct 3 Criterion Construct Domain Domain 1 5 2 4

Some more stuff • I want to add some stuff here related to K12 and admissions testing using the Test Standards and: • The rights of test takers to fair and accurate information; • To appropriate information to help them prepare (repeat test takers right to know why they failed) • Also, consequential validity stuff, admission testing for K12 • Value to community • Load of testing requirements

The use of the Test Standards with evolving program uses

The use of the Test Standards with evolving program uses

Presentation Transcript

The Standards of Learning Program

Meaning and Uses of the Direct Antiglobulin Test

Modelling evolving patterns of land use: The FEARLUS model

The FREIA Test Program

The Test of Economic Literacy Content, Development,Uses

The Uses of Historiography

The Evolving Role of the VP

Uses of the Ablative

Evolving The CTR Program:

Optimizing the Use of Data Standards

Evolving IMT-2000 Standards

The Next Generation of NCATE Program Standards

THE ACOUSTICAL SOCIETY OF AMERICA STANDARDS PROGRAM

The Virginia Standards of Learning Program

The Evolving Experience of the Internet

The Ever Evolving Loan Program

The next generation of NCATE program standards

Standards of Conduct Test

Growing with the Standards

Uses of The

The Ideal Use of SNMP Tool Test

Uses Of Glucose Test Strips with Complete KIT