240 likes | 467 Views
OVERVIEW OF CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT. the BIG 3. Reliability (Chapter 3) Validity (Chapter 4) Absence-of-Bias (Chapter 5). Reliability of Assessment Chapter 3. (p.61-82) W. James Popham. Standardized tests are evaluated by reliability.
E N D
OVERVIEW OF CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT the BIG 3 Reliability (Chapter 3) Validity (Chapter 4) Absence-of-Bias (Chapter 5)
Reliability of Assessment Chapter 3 (p.61-82) W. James Popham
Standardized tests are evaluated by reliability. • However, an individual’s performances and responses vary from one occasion to another, even under controlled conditions. • “An individual’s score and the average score of a group will always reflect at least a small amount of measurement error”(qtd. In American Education Research Association, 1999). RELIABILITY of assessment:
1. STABILITY RELIABILITY: consistency of tests over time. • Test students on one occasion, wait a week or two, and test them again using the same assessment. • Compare the scores from the two testing times, to determine the test’s stability.(See Table 3.2, p. 65) (However, teachers don’t normally determine the reliability of their own classroom tests, unless they’re doing research or evaluating an end of term test, etc..) RELIABILITY = CONSISTENCY
“In general, if you construct your own classroom tests WITH CARE, those tests will be sufficiently reliable for the decisions you will base on the tests’ results” (Popham 75). “In short, you need to be at least knowledgeable about the fundamental meaning of reliability, but I do not suggest you make your own classroom tests pass any sort of reliability muster”(inspection) (Popham 75). RELIABILITY = CONSISTENCY
2. ALTERNATE-FORM RELIABILITY: Using two different test forms (Form A, Form B) that are allegedly equivalent. • To determine the alternate-form consistency, give both test forms to the same individuals. • Compare the students’ scores from the two test forms, and decide on a level of performance that you would consider to be passing. • Commercially published or state-developed tests should have evidence to support that their alternate test forms are indeed equivalent (67). RELIABILITY = CONSISTENCY
3. INTERNAL CONSISTENCY RELIABILITY: Are the items in a test doing their measurement job in a consistent manner? • The internal items on a test should be homogeneous, or designed to measure a single variable such as “reading achievement” or “mathematical problem solving” (69). • “The more items on an educational assessment, the more reliable it will tend to be” (69). (Example: a 100-item test on mathematics will give you a more reliable fix on a student’s ability in math, than a 20 item test.) RELIABILITY = CONSISTENCY
The consistency of an individual’s scores if given the same assessment again and again and again. • The higher the reliability of the test, the smaller the SEM will be (72). • (See the formula for SEM on p. 73). STANDARD ERROR OF MEASUREMENT (SEM)
The SEM is linked to the way students’ performances on state accountability tests are reported: • (Exemplary, Exceeds Standards, Meets Standards, Approaching Standards, Academic Warning) • (Advanced, Proficient, Basic, Below Basic)(74). STANDARD ERROR OF MEASUREMENT (SEM)
Reliability is a central concept in the measurement of an assessment’s consistency (75). •Teachers may be called on to explain to parents the meaning of a student’s important test scores, and the reliability of the test is at stake. •Teachers should be knowledgeable about the three types of reliability evidence. Test manuals should support the reliability evidence. •Popham doesn’t think teachers need to devote time in calculating the reliability of their own tests; however, teachers DO need to have a general knowledge about reliability and why it’s important (77). What do teachers really need to know about RELIABILITY?
Validity Chapter 4 (p.83-110) W. James Popham
However, according to Popham, “There is no such thing as a ‘valid test’(85). Rather, validity centers on the accuracy of the inferences that teachers make about their students through evidence gathered formally or informally (87). • Popham suggests that teachers focus on score-based and test-based inferences and make sure they are accurate / valid (87). “VALIDITY is the most significant concept in assessment”
1. CONTENT RELATED: • The content standards, or skills and knowledge that need to be mastered, are represented on the assessment. • Teachers should focus their instructional efforts on content standards or curriculum aims. VALIDITY EVIDENCE
2. CRITERION RELATED: • An assessment that is used in order to predict how well a student will perform at some later point. • An example is the relationship between students’ scores on (1) an aptitude test such as the SAT or the ACT test that a student takes in high school, and (2)the scores the student gets from this test that predicts how well the student is apt to perform in college (97). • However, Popham says that “these tests are far from perfect.” In fact, he suggests that “25 % of student’s college grades can be explained by their scores on aptitude tests. Other factors such as motivation and study habits account for the other 75%”(98). VALIDITY EVIDENCE
CONSTRUCT RELATED: • The test items are accurately measuring what they are supposed to be measuring. • Constructed evidence, such as a student’s ability to write an essay, is accurately assessed. VALIDITY EVIDENCE
Remember that it is not the validity of the ‘test’, but score-based inferences. “And because score-based inferences reflect judgments made by people, some of those judgments will be in error” (Popham 106). •Again, Popham doesn’t think teachers should worry about gathering validity evidence; however, they DO need to have a reasonable understanding of the 3 types of validity. They must be especially concerned about content related validity. What do teachers really need to know about VALIDITY?
Absence-of-Bias Chapter 5 (p.111-137) W. James Popham
Any element in an assessment procedure that offends or unfairly penalizes students because of personal characteristics; • These characteristics include students’ gender, race, ethnicity, socioeconomic status, religion, etc. (111). “Assessment bias” is:
Only males are shown in high-paying and prestigious positions (attorneys, doctors); while women are portrayed in low-paying and unprestigious positions (housewives, clerks). • Word problems are based on competitive sports, using mostly boys’ names, suggesting that girls are less skilled. Examples of “offensive” content on test items:
Problem solving items (deals with attending operas and symphonies); may be more advantageous for children from affluent families, than lower socioeconomic students who may lack these experiences (113). Examples of “unfair penalization” on test items:
1975, The Education for All Handicapped Children Act (Public Law 94-142) • “Individualized Education Program” (IEP) • 1997, The Individuals with Disabilties Act (IDEA) • Special education significantly altered in 2002 with No Child Left Behind (NCLB). • NCLB required adequate yearly progress (AYP) not only for students overall, but for subgroups including children with disabilities (123). CHILDREN WITH DISABILITIES AND FEDERAL LAW
Students whose first language is not English and know very little, if any; • Students who are beginning to learn English; • Students who are proficient in English but need additional assistance. • Popham states, “To provide equivalent tests in English and all other first languages spoken by students is an essentially impossible task”(130). ENGLISH LANGUAGE LEARNERS (ELL)
Today’s federal laws oblige teachers to test ELL students and students with disabilities on the same curricular goals as all other children. • Assessment accommodations or modifications are not completely satisfying nor accurate. • “Absence-of-bias is especially difficult to attain in cases of ELL students and students with disabilities” • (Popham 133). ASSESSMENT MODIFICATIONS?
OVERVIEW OF CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT the BIG 3 Reliability (Chapter 3) Validity (Chapter 4) Absence-of-Bias (Chapter 5)