Update on MCAS: Is it Working? Is it Fair?

Update on MCAS: Is it Working? Is it Fair? Ronald K. Hambleton University of Massachusetts at Amherst EPRA Seminar, November 5, 2005. (revised)

Purposes • Address some of the misconceptions that exist about the MCAS. • In addressing the misconceptions, provide some insights about MCAS, and answer questions about its level of success and its fairness. • Share some of my own concerns about MCAS and next steps.

General Goals of State Testing Programs Like MCAS • Provide students, their parents, and teachers with feedback on student educational progress in relation to state curricula. • Compile data for monitoring progress or change in student performance over time. • Provide data for educational accountability. (e.g., NCLB legislation)

Characteristics of MCAS Assessments • English Language Arts and Mathematics assessments at several grades. • Science, social studies/history assessments, second language proficiency are coming in Massachusetts. • Multiple-choice (60%)and performance tasks (40%). • Assessments include a core of items for student reporting, and other items (for field-testing, curriculum evaluation, and linking test forms over time) • Performance standards set by educators.

MCAS is not just about testing! It is about: -- substantially increased funding for education --curriculum reform --integrating of curricula, instruction, and assessment --improving administrator and teacher training, and educational facilities --addressing special needs of students

Let’s consider next, six common criticisms of MCAS.

1. State tests encourage teachers to “teach to the test” and this narrows the curriculum taught. • This is potentially a valid concern—Problematic with NRTs—10 to 20% coverage of curricula, MC items only. Same skills and items are assessed each year. Here, teaching narrowly to the skills and content of the test improves test scores but not learning of the broader curricula. But MCAS assessments are not NRTs!!

1. State tests encourage teachers to “teach to the test” and this narrows the curriculum taught. • MCAS Assessments are representative of the curricula and new items are used each year. What does “teaching to the test” mean when the tests are a sample of the curricula? Teach the curricula! • Consider the next set of displays—85% or more of MCAS curricula are assessed in every three year cycle.

Comparison of Percent of Learning Standards Assessed in Math at Grades 4, 6, 8, and 10 from 2001 to 2004. (about 40/grade)

Percent of learning standards assessed in Mathematics at grade 4, 6, 8 and 10 in the time periods, 2001 to 2003, and 2002 to 2004.

In sum, no justification for narrowing the curricula: The assessments are representative of the curricula, and over three year periods, over 85% of learning standards are being assessed. (Results at all grades and subjects are equally good.) Teaching to the test/assessment means teaching the curricula! • Other states, too (e.g., Minnesota) have found that when tests and curricula are aligned, teachers are considerably more supportive of educational assessments. Teachers need to see these alignment results in Massachusetts!

2. Important decisions about students should not turn on one test. • AERA, APA, and NCME Test Standards highlight importance of measurement errors, and the undesirability of a single test driving an important decision. • Reality: State tests (e.g., grade 10 ELA and Math) are not the only requirements for students—English, mathematics, science, history credits at HS level are required, as well as regular attendance.

2. Important decisions about students should not turn on one test. • Students have five chances to pass grade 10 ELA and math assessments during their high school years. • Appeals process (for students close to the passing score, and who are attending school regularly, and taking the assessments). • Math assessment is available in Spanish too (to reduce bias).

2. Important decisions about students should not turn on one test. • DOE is expecting schools to be doing their own assessments too (using multiple methods, such as projects, portfolios, work samples, classroom tests, etc.)—I think one might question high school grading practices at grades 11 and 12 if a grade 10 test was a major block for graduation. • In sum, the criticism does not have merit.

3. State assessments are full of flawed and/or biased test items. • Item writing is not a perfect science and mistakes will be made. • Massachusetts releases all operational items on the web-site shortly after their use. Find the flawed items if you can? (I can’t and I have seriously looked.) This is a remarkable situation—few states release items—excellent for instructional purposes and critics. If critics think items are flawed, report them.

3. State assessments are full of flawed and/or biased test items. • Process of preparing items in Massachusetts is state-of-the art: qualified and culturally diverse item writers; content and bias reviews by committees, department, and contractors; field testing; study of statistical evidence for bias; and care in item selection (optimal statistically and content valid).

3. State assessments are full of flawed and/or biased test items. • UMass has looked at over 1000 items over years, grades, and tests, and found little statistical evidence for gender and racial bias. • I just don’t see the merit of this criticism, and I have studied these tests to find flaws and biases and can’t (but for a few items in science).

4. Student testing takes up too much time and money. • Quality tests are expensive, and require student time. (Reliability of scores needs to be high.) (six hours in some grades—grades 4 and 10) • In Massachusetts, for example, all students at grades 3, 4, 6, 7, 8, and 10 are tested for some subjects.

4. Student testing takes up too much time and money. (Cont.) • 4 to 6 hours/per student or about 0.5% of instructional time per year. (one day of 180!) • $7.0 billion on education, $25 million for assessment/year, $20.00/student. 0.3% of education budget, or 1 of every 300 dollars spent on MCAS assessments! Seems obvious that the amount of time and cost of assessments is not out of line with value.

Changes in timing or scheduling of the test—reaction to criticisms in 1998. • Administer test in short periods. • Administer at a time of day that takes into account the student’s medical needs or learning style. • Time of testing varies by grades, but takes less than one day (total) of 180 days of school year, not all grades assessed, and diagnostic results from students and groups can be used to improve instructional practices!

One Example: Using Item Analysis Results at the School Level (reproduced with permission of MDOE) Students Performing At the Proficient Level Your school

5. Passing scores are set too high. • Too often judgment of pass rates is made based on failure rates. • Look at the process used by the states, look for validity evidence. • Who is setting the passing scores and what method are they using? • What is the evidence to support performance standards that are set too high in Massachusetts? It doesn’t exist, in my judgment.

5. Passing scores are set too high. • Typically, passing scores are set by educators, school administrators; and sometimes parents and local persons are included too. In Mass, teachers dominated. (52% of panelists) • Critics need to study the procedures used in setting passing scores and validity evidence. • As an expert on this topic, I can tell you that the state used exemplary procedures.

5. Passing scores are set too high. • Test scores are placed on a new reporting scale with scores from 200 to 280. 220 is passing. • In 2005, more than 75% of grade 10 students passed both ELA and math assessments (first time), and pass rates over 80% for each assessment (first time takers). • I don’t see merit in the criticism.

6. There is little or no evidence that MCAS is producing results. • Internal evidence (sample): --At the grade 10 level, pass rates have been steadily increasing. --Research evidence by Learning Innovations (2000): 90% of schools indicated changes in curricula; changes influenced by test results; instruction influenced by MCAS results in over 70% of teachers.

6. There is little or no evidence that MCAS is producing results. • External evidence: --State received very positive reviews about the MCAS curricula from Achieve, Inc. (a national review group)--among the best in country. --NAEP scores are up since 1992 for white, Black, and Hispanic students. --SAT scores are up, and more students taking the SAT.

NAEP 2005 Massachusetts and National Results: Percentages at NAEP Achievement Levels … Mathematics Grade 4 Reading Grade 4

Mathematics Grade 4: Percentage at NAEP Achievement Levels Source: Massachusetts Snapshot Report 2005; US DOE, IES, NCES

Reading Grade 4: Percentage at NAEP Achievement Levels Source: Massachusetts Snapshot Report 2005; US DOE, IES, NCES

1994-2004 Massachusetts Mean SAT Scores Combined Verbal & Math MA Nation

Personal Concerns • Drop out rates have increased, especially for inner-city students. But how much? Why? What can be done if true? • Retention rates at the ninth grade are up. How much? Why? What can be done? • Consequential validity studies are needed. Intended and unintended outcomes—both positive and negative need to be identified, and addressed.

Personal Concerns • Funding of schools. Is it sufficient? Are we spending the money on the right items and in the appropriate amounts—teachers, special programs, school facilities, etc.? (Assessment results provide clues, at least, to problem areas.)

Conclusions • I am encouraged by educational reform in Massachusetts—many positive signs: funding, curricula, assessments, concern for students who need special assistance, etc. • Internal and external validity evidence is very encouraging. • Importance problems remain—notably achievement gap, and funding issues.

Conclusions • I am troubled by the misconceptions that are so widely held about the MCAS. They interfere with effective implementation. • Would like to see everyone get behind educational reform, and make it work for more students. Continue with the strengths and address problems. --Compile substantial validity evidence, then make the necessary changes, with the goal to make education in Massachusetts meet the needs of all students.

Follow-up reading: • R. P. Phelps. (Ed.). (2005). Defending standardized testing. Mahwah, NJ: Lawrence Erlbaum Publishers.

Please contact me at rkh@educ.umass.edu for a copy of the slides, or to forward your questions and reactions.

Some extra slides. Not used in the presentation because of limited time.

State approach to minimizing drop-outs: • Provide clear understanding to students about what is needed. • Improve students’ classroom curricula and instruction. • Offer after-school and summer programs. • Find new role for community colleges to meet student needs. • Do research to identify reasons for drop-outs and then react if possible.

7. Testing accommodations are not provided to students with disabilities. • Federal legislation is very clear on the need for states to provide test accommodations to students who need them. (ADA, IDEA legislation) • Validity of scores is threatened. • State provides a large set of accommodations.

Long List of Available Accommodations • About 20 accommodations organized into four main categories—(a) changes in timing, (b) changes in setting, (c) changes in administration, and (d) changes in responding.

b. Changes in test setting • Administer to a small group or private room • Administer individually • Administer in a carrel • Administer with the student wearing noise buffers • Administer with the administrator facing student

c. Changes in test administration • Using magnifying equipment or enlargement devices • Clarifies instruction • Using large-print or Braille editions • Using tracking items • Using amplification equipment • Translating into American Sign Language

d. Changes in how the student responds to test questions • Answers dictated • Answers recorded

8. State tests must be flawed because failure rates are high and better students do go onto jobs and colleges. • Actually failure rates at the grade 10 level are not high. (80% pass both tests on first chance) • NAEP results are not that far out-of-line with state results in New England. [in fact, results are close] • Too many colleges must offer basic reading and math courses.

8. State tests must be flawed because failure rates are high and better students go onto jobs and colleges. • Internationally, we are about middle of the pack. In one of the recent studies, we were right there with Latvia and New Zealand, and trailing Korea, Singapore, and many other industrial countries.

9. Test items are biased against minorities. • Another excellent validity concern, but the evidence is not supportive of the charge in Massachusetts. • We have analyzed available 1998, 2000, 2001 grade 4, 8, 10, ELA, math, science, history; Male-Female; Black-White; Hispanic-Black.

Conditional P-Value Plot of Uniform DIF (SDIF=0.135, UDIF=0.136)

Conditional P-Value Plot of Non-Uniform DIF (SDIF=0.060, UDIF=0.029)

Update on MCAS: Is it Working? Is it Fair?