400 likes | 503 Views
Validity in Action: State Assessment Validity Evidence for Compliance with NCLB. William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland. Objectives.
E N D
Validity in Action: State Assessment Validity Evidence for Compliance with NCLB William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland
Objectives • review the evidence that state testing programs provide to the United States Department of Education on the validity of their assessments • examine in detail the validity evidence that certain selected states provided for their peer reviews • make recommendations for improving the evidence submissions supporting validity for state assessments
Data Sources • official decision letters on each state's final assessment system under NCLB from USED; publicly available at www.ed.gov • peer review reports for five selected states • technical reports for available states that have received full approval from USED; downloaded from the web sites of each state
Types of Validity Evidence • the AERA/APA/NCME Standards lists five types of validity evidence • content-based evidence • response-process-based evidence • evidence based on internal structure • evidence based on relationships with other variables • evidence based on consequences • we will look at the judgments that each type should support in the context of statewide assessments of educational achievement
Content-Based Evidence judgments that need to be supported: • the domain is described in the academic content standards at the grade level • the test items sample that content domain appropriately • achievement level descriptions refer back to the content domain of the test
Response-Process-Based Evidence judgment that needs to be supported: • the activities the test demands of students are consistent with the cognitive processes the test is supposed to represent (as implied by the content standards)
Evidence Based on Internal Structure judgment that needs to be supported: • test score relationships are consistent with the strand structures of the academic content standards
Evidence Based on Relationships with Other Variables judgments that need to be supported: • higher correlations occur when traits are more similar • low correlations (perhaps partial on ability) exist with specific traits (e.g., gender, race-ethnicity, disability)
Evidence Based on Consequences judgments that need to be supported: • test use maximizes positive outcomes • test use minimizes negative outcomes
Decision Letters • decision letters were viewed at the USED web site – they are public documents • 19 of the states were required to provide additional validity evidence • the evidence was not classified by USED, but we classified it into the five types to help make the project manageable • decision-letter evidence is required by USED – it is mandatory – these elements may be thought of as necessary for states to submit
Content-Based Evidence • evidence to show that assessments measure the academic content standards and not characteristics not specified in the academic content standards or grade level expectations • blueprints, item specifications, and test development procedures • evidence of alignment with content standards – this is an emphasis in peer review • explanations of design and scoring • standard setting process, results, and impact
Response-Process-Based Evidence • evidence to show that items are tapping the intended cognitive processes – this sort of evidence is commonly a part of alignment studies
Evidence Based on Internal Structure • item interrelationships • subscale score correlations showing they are are consistent with the structures inherent to the academic content standards • scoring and reporting are consistent with the subdomain structure of the content standards • justification of score use given the threat (observed) that the subdomain correlations are higher between content areas than within content areas
Evidence Based on Relationships with Other Variables • criterion validity • relationships between test scores and external variables
Evidence Based on Consequences • studies of intended and unintended consequences
Evidence from State Submissions • each state submitted voluminous evidence to USED • the Peer Review Reports included descriptions of the evidence submitted • we had sets of Reports for five states • this evidence may be over and above what is actually required
Evidence of Purposes • each state was asked to provide evidence about the purposes of their assessments • each state did that • this is an important part of Kane’s (2006) concept of a validity argument • because it does not fall into the categories of validity evidence in the USED Peer Review Guidance, we did not include it in our review
Content-Based Evidence • test blueprints & construction process • alignment reports • categorical concurrence (each content strand has enough items for a subscore report) • range of knowledge (the number of content elements in each strand that have items associated with them) • balance of representation (the distribution of items across the content elements within each strand) • achievement level descriptions (ALDs) compared with the strand structure
Response-Process-Based Evidence • alignment reports • depth of knowledge (relates the cognition tapped by each item to that implied in the statement of the element in the content standards the item is associated with) • think-aloud studies (proposed)
Evidence Based on Internal Structure • dimensional analysis at the item level • principal components analysis • dimensionality hypothesis testing • intercorrelations among the subtest scores
Evidence Based on Relationships with Other Variables • correlations with external tests of similar constructs (and dissimilar constructs) • correlations with student demographics and course-taking patterns • choosing and implementing accommodations for disabilities and limited English proficiency • bias studies (e.g., DIF) and passage reviews • universal design principles • monitoring of test administration procedures
Evidence Based on Consequences • longitudinal change in dropout and graduation rates and NAEP results • use of results to evaluate schools and districts • use of test data to improve curriculum & instruction • use of adequate yearly progress reports • use of tests to make promotion & graduation decisions
Synthesis of Evidentiary Needs • it would be useful to have a minimum list for state regulatory submissions • can we use these studies to generate a list? • most likely over-inclusive using our evidence • as soon as we do so, it will surely be challenged • it seems reasonable to submit the following • for each test series (e.g., regular, alternate) • for each tested content and grade combination
Content Evidence • content standards • test blueprint • item (and passage) development process • item categorization rules and process • forms development process (e.g., item sampling; item location; section timing) • results of alignment studies
Process Evidence • test blueprint (if it has a process dimension) • item categorization rules and method (if items are categorized by process) • results of alignment studies • results of other studies, such as think-alouds
Internal Structure Evidence • subscore correlations • Item-subscore correlations • dimensionality analyses
Relations with Other Variables • convergent Evidence • correlations with independent, standardized measures • correlations with within-class variables, such as grades • discriminant Evidence • correlations with standardized tests of other traits (e.g., math with reading) • correlations with within-class variables, such as grades in other contents • correlations with irrelevant student characteristics (e.g., gender) • item-level (e.g., DIF) studies
Consequential Evidence • purposes of the test – as they describe intended consequences • uses of results by educators • trends over time • studies that generate and evaluate positive and negative aspects from user input
Validity in the Accountability Context – Role of Processes • majority of the evidence submitted capitalizes on well-known methods for study of the validity of a particular test form – a product • but object of study in accountability is actually a process by which tests are developed & used • a test form is important only as a representative of a process of test development • programs are expected to engage in a continual process of self-evaluation and improvement
Process Evidence • assume it is useful to distinguish between product evidence and process evidence • product evidence focuses on a particular test and • process evidence focuses on a testing program • will review and extend some suggestions for process evidence that were originally proposed in the context of state assessment and accountability peer reviews
What is a Process? • a recurring activity that takes material, operates on it, and produces a product • concept is borrowed from project management • could be as large as the entire assessment and accountability program • could be as small as, say, the production of a test item • one challenge is to organize the activities of a program into useful processes
Is Validity a Process Concept? • i.e., is there a sense in which we can use the concept of the validity of a process? • validity is justification for an interpretation of a score • a test form is a static element that can contribute support for an interpretation • a process is a dynamic element that can contribute support for future interpretations • so we give this one a tentative “yes”
Elements of Process Evidence • process • The process is described • The inputs and operating rules are laid out • product • The results of the process are presented or described • evaluation (how are these questions are considered) • is the process adequate? • can (or how can) it be improved? • should it be improved (e.g., do the benefits justify the costs)? • improvement (how the consideration is done?) • The recommendations from the evaluation are considered for implementation in order to improve the process
Examples of Process Evidence • three examples of these four elements of process evidence follow • they vary markedly in scope • small to large • illustrate the nature of process evidence for different contexts within an assessment and accountability program
Bias and Sensitivity Committee Selection • process. desired composition, generation of committee members, contacting potential members, proposed meeting schedule, etc. • product. committee composition, especially the constituencies represented. • evaluation. comparison of actual with desired composition, follow up with persons who declined, suggestions for improvement. • improvement. who has responsibility to consider the recommendations generated by the evaluation, how they go about their analysis, how change is implemented in the system, examples of changes that were made in the past to document responsiveness
Alignment • process. test blueprint, items, item categorizations, sampling processes • product. a test form • evaluation. alignment study • improvement. review of study recommendations, plan for future
Psychometric Adequacy of a Test Form • process. the analyses that are performed. • product. technical manual • evaluation. review by a group such as a TAC, recommendations for the manual as well as the testing program • improvement. consideration of recommendations, plan for future
Making Judgments About Processes • two typically independent layers of judgment • first layer is an evaluation that makes recommendations about improvement • second layer considers them • in many cases, second layer would be an excellent way for a state to use its TAC
Judging Process Evidence • process evidence by definition describes processes • it should be judged by how well it describes processes that support interpretations based on future assessments • it should also be judged on how well it describes processes that lead to improvements in the program
Possible Criteria for Process Evidence • data are collected from all relevant sources • data are reported completely and efficiently • reviewed by persons with appropriate expertise • review is conducted fairly • review results are reported completely and efficiently • recommendations are suggested in the reports • consideration given to the recommendations • past actions based are presented as evidence that the process results in improvement