1 / 29

An Introduction to Validity Arguments for Alternate Assessments

An Introduction to Validity Arguments for Alternate Assessments. Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October 11-12, 2007. Overview. A little validity background

Download Presentation

An Introduction to Validity Arguments for Alternate Assessments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Validity Arguments for Alternate Assessments Scott Marion Center for Assessment Eighth Annual MARCES Conference University of Maryland October 11-12, 2007

  2. Overview • A little validity background • Creating and evaluating a validity argument…or translating Kane (and others) to AA-AAS • Can we make it practical? • A focus on validity in technical documentation Marion. Center for Assessment. MARCES 2007

  3. Validation is “a lengthy, even endless process” (Cronbach, 1989, p.151) Good for consultants, but not so great for state folks and contractors Are you nervous yet…. Marion. Center for Assessment. MARCES 2007

  4. Validity Should be Central • We argue that the purpose of the technical documentation is to provide data to support or refute the validity of the inferences from the alternate assessments at both the student and program level Marion. Center for Assessment. MARCES 2007

  5. Unified Conception of Validity • Drawing on the work of Cronbach, Messick, Shepard, and Kane the proposed evaluation of technical quality is built around a unified conception of validity • centered on the inferences related to the construct including significant attention to the social consequences of the assessment Marion. Center for Assessment. MARCES 2007

  6. But what is a validity argument and how do we evaluate the validity of our inferences? Marion. Center for Assessment. MARCES 2007

  7. A little history • Kane traces the history of validity theory from the criterion through the content model to the construct model. • It is worth stopping briefly to discuss the content model, because that appears to be where many still appear to operate. Marion. Center for Assessment. MARCES 2007

  8. “The content model interprets test scores based on a sample of performances in some area of activity as an estimate of overall level of skill in that activity.” The sample of items/tasks and observed performances must be: representative of the domain, evaluated appropriately and fairly, and part of a large enough sample So, this sounds good, right? Marion. Center for Assessment. MARCES 2007

  9. Concerns with the content model • “Messick (1989) argued that content-based validity evidence does not involve test scores or the performances on which the scores are based and therefore cannot be used to justify conclusions about the interpretation of test scores.” (p. 17) • Huh? More simply…content evidence is a matching exercise and doesn’t really help us get at the interpretations we make from scores • Is it useful? Sure, but with the intense focus on alignment these days, content evidence appears to be privileged compared with trying to create arguments for the meaning of test scores Marion. Center for Assessment. MARCES 2007

  10. The Construct Model • We can trace this evolution from Cronbach and Meehl (1955) through Loevinger (1957) to Cronbach (1971) and culminating in Messick 1989) • Focused attention on the many factors associated with the interpretations and uses of test scores (and not simply with correlations) • Emphasized the important effect of assumptions in score interpretations and the need to check these assumptions • Allowed for the possibility of alternative explanations for test scores—in fact, this model even encouraged falsification Marion. Center for Assessment. MARCES 2007

  11. Limitations of the Construct Model • Does not provide clear guidance for the validation of a test score interpretation and/or use • Did not help evaluators prioritize validity studies • If, as Anastasi (1986) noted, “almost any information gathered in the process of developing or using a test is relevant to its validity (p. 3),” where should one start and how do you know when you’re done or are you ever done? Marion. Center for Assessment. MARCES 2007

  12. Transitioning to argument… • The call for careful examination of alternative explanations within the construct model is helpful for directing a program of validity research Marion. Center for Assessment. MARCES 2007

  13. Kane’s argument-based framework • “…assumes that the proposed interpretations and uses will be explicitly stated as an argument, or network of inferences and supporting assumptions, leading from observations to the conclusions and decisions. Validation involves an appraisal of the coherence of this argument and of the plausibility of its inferences and assumptions (Kane, 2006, p. 17).” • Sounds easy, right… Marion. Center for Assessment. MARCES 2007

  14. Two Types of Arguments • An interpretative argument specifies the proposed interpretations and uses of test results by laying out the network of inferences and assumptions leading to the observed performances to the conclusions and decisions based on the performances • The validity argument provides an evaluation of the interpretative argument (Kane, 2006) Marion. Center for Assessment. MARCES 2007

  15. Kane’s approach provides a more pragmatic approach to validation, “…involving the specification of proposed interpretations and uses, the development of a measurement procedure that is consistent with this proposal, and a critical evaluation of the coherence of the proposal and the plausibility of its inferences and assumptions.” The challenge is that most assessments do not start from an explicit attention to validity in the design phase Marion. Center for Assessment. MARCES 2007

  16. The Interpretative Argument • Essentially a mini-theory—the interpretative argument provides a framework for interpretation and use of test scores • Like theory, the interpretative argument guides the data collection and methods and most importantly, theories are falsifiable as we critically evaluate the evidence and arguments Marion. Center for Assessment. MARCES 2007

  17. Two stages of the interpretative argument • Development stage—focus on development of measurement tools and procedures as well as the corresponding interpretative argument • An appropriate confirmationist bias in this stage since the developers (state and contractors) are trying to make the program the best it can be • Appraisal stage—focus on critical evaluation of the interpretative argument • Should be more neutral and “arms-length” to provide a more convincing evaluation of the proposed interpretations and uses “Falsification, obviously, is something we prefer to do unto the constructions of others” (Cronbach, 1989, p. 153) Marion. Center for Assessment. MARCES 2007

  18. Interpretative argument • “Difficulty in specifying an interpretative argument…may indicate a fundamental problem. If it is not possible to come up with a test plan and plausible rational for a proposed interpretation and use, it is not likely that this interpretation and use will be considered valid” (Kane, 2006, p. 26). • Think of the interpretative argument as a series of “if-then” statements… • E.g., if the student performs the task in a certain way, then the observed score should have a certain value Marion. Center for Assessment. MARCES 2007

  19. Criteria for Evaluating Interpretative Arguments • Clarity—should be clearly stated as a framework for validation. Inferences and warrants specified in enough detail to make proposed claims explicit. • Coherence—assuming the individual inferences are plausible, the network of inferences leading from the observations to conclusions and decisions make sense • Plausibility—particularly of assumptions, are judged in terms of all the evidence for and against them Marion. Center for Assessment. MARCES 2007

  20. One of the most effective challenges to interpretative arguments (or scientific theories) is to propose and substantiate an alternative argument that is more plausible With AA-AAS we have to seriously consider and challenge ourselves with competing alternative explanations for test scores, for example… “higher scores on our state’s AA-AAS reflects greater learning of the content frameworks” OR “higher scores on our state’s AA-AAS reflects higher levels of student functioning” Marion. Center for Assessment. MARCES 2007

  21. Categories of interpretative arguments (Kane, 2006) • Trait interpretations • Theory-based interpretations • Qualitative interpretations • Decision procedures • Like scientific theories, the specific type of interpretative argument for test-based inferences guides models, data collection, assumptions, analyses, and claims Marion. Center for Assessment. MARCES 2007

  22. Decision Procedures • Evaluating a decision procedure requires an evaluation of values and consequences • “To evaluate a testing program as an instrument of policy [e.g., AA-AAS under NCLB], it is necessary to evaluate its consequences” (Kane, 2006, p.53) • Therefore, values inherent in the testing program must be made explicit and the consequences of the decisions as a result of test scores must be evaluated! Marion. Center for Assessment. MARCES 2007

  23. Prioritizing and Focusing • Shepard (1993) advocated a straightforward means to prioritize validity questions. Using an evaluation framework, she proposed that validity studies be organized in response to the questions: • What does the testing practice claim to do; • What are the arguments for and against the intended aims of the test; and • What does the test do in the system other than what it claims, for good or bad? (Shepard, 1993, p. 429). • The questions are directed to concerns about the construct, relevance, interpretation, and social consequences, respectively. Marion. Center for Assessment. MARCES 2007

  24. INTERPRETATION OBSERVATION COGNITION A heuristic to help organize and focus the validity evaluation (Marion, Quenemoen, & Kearns, 2006) • VALIDITY EVALUATION • Empirical Evidence • Theory and Logic (argument) • Consequential Features • Reporting • Alignment • Item Analysis/DIF/Bias • Measurement Error • Scaling and Equating • Standard Setting • Assessment System • Test Development • Administration • Scoring • Student Population • Academic Content • Theory of Learning

  25. Synthesizing and Integrating • Haertel (1999) reminded us that the individual pieces of evidence (typically presented in separate chapters of technical documents) do not make the assessment system valid or not, it is only by synthesizing this evidence in order to evaluate the interpretative argument can we judge the validity of the assessment program. Marion. Center for Assessment. MARCES 2007

  26. NHEAI/NAAC Technical Documentation • The “Nuts and Bolts” • The Validity Evaluation • The Stakeholder Summary • The Transition Document Marion. Center for Assessment. MARCES 2007

  27. The Validity Evaluation Author: Independent contractor with considerable input from state DOE Audience: State policy makers, state DOE, district assessment and special education directors, state TAC members, special education teachers, and other key stakeholders. This also will contribute to the legal defensibility of the system. Notes: This will be a dynamic volume where new evidence is collected and evaluated over time. Marion. Center for Assessment. MARCES 2007

  28. Table of Contents • Overview of the Assessment System • Who are the students? • What is the content? • Introduction of the Validity Framework and Argument • Empirical Evidence • Evaluating the Validity Argument Marion. Center for Assessment. MARCES 2007

  29. Chapter VI: The Validity Evaluation • Revisiting the interpretative argument • Logical/theoretical relationships among the content, students, learning, and assessment—revisiting the assessment triangle • The specific validity evaluation questions addressed in this volume • Synthesizing and weighing the various sources of evidence • Arguments for the validity of the system • Arguments against the validity of the system • An overall judgment about the defensibility of inferences from the scores of the AA-AAS in the context of specific uses and purposes Marion. Center for Assessment. MARCES 2007

More Related