320 likes | 509 Views
Assessment for Educational Leaders W. J. Popham 2006. Loree Korb Matt McLelland Kelsea Quillin Cheryl Vogler. A Call for Comparisons in Competitive Contexts. Three factors should always be applied to comparative data: Validity Reliability Absence of Bias
E N D
Assessment for Educational LeadersW. J. Popham2006 Loree Korb Matt McLelland KelseaQuillin Cheryl Vogler
A Call for Comparisons in Competitive Contexts Three factors should always be applied to comparative data: Validity Reliability Absence of Bias Norm Referenced interpretations make sense out of tests and data
What do Statistics tell us about our Educational System? • These organizations report on comparative educational data: • Arne Duncan, (US Education Secretary) Education, International Education Rankings, OECD, Organization For Economic Co-Operation And Development, Program For International Student Assessment, United States Education Ranking, US Education, Us Education Ranking, Us Education Ranking In The World, World Education Rankings, Education News • The three-yearly OECD Program for International Student Assessment (PISA) report, which compares the knowledge and skills of 15-year-olds in 70 countries around the world, ranked the United States 14th out of 34 OECD countries for reading skills, 17th for science and a below-average 25th for mathematics. • Out of 34 OECD Countries we ranked: • Reading: 14th • Science: 17th • Math: 25th http://www.huffingtonpost.com/2010/12/07/us-falls-in-world-education-rankings_n_793185.html
The Normal Curve • The normal curve is: • Symmetrical • Bell shaped curve whose mean, median and mode are identical • Realistically, scores In the classroom are not usually this evenly distributed
Percentiles • A popular technique for comparing individuals test scores is by percentiles. This is a point on the distribution where a score falls. • 25th Percentile: First Quartile • 50th Percentile: Median and Second Quartile • 75th Percentile: Third Quartile • A quartile is a point, not a range of scores.
Standard Scores • Standard Scores indicate how far the student’s score is from the mean of the distribution of the test scores. • Z Score tells you how far a raw score is above or below the mean of it’s distribution • X= raw score; X= the mean of the distribution; s= the standard deviation of the distribution • A key point is that calculating z requires the population mean and the population standard deviation, not the sample mean or sample deviation. It requires knowing the population parameters, not the statistics of a sample drawn from the population of interest. But knowing the true standard deviation of a population is often unrealistic except in cases such as standardized testing, where the entire population is measured. In cases where it is impossible to measure every member of a population, the standard deviation may be estimated using a random sample.
Standard ScoresT-Scores • T Scores Formula is: t=50+10z • The T score is merely a z score that has been multiplied by 10 (to get rid of the decimals) and had 50 added to it (to get rid of the minus values). • Some call the T-score a “Transformed Standard Score” because it is just a z score transformed.
Normal Curve Equivalents The Normal Curve Equivalent, or NCE, is a way of measuring where a student falls along the normal curve. The numbers on the NCE line run from 1 to 99, similar to percentile ranks, which indicate an individual student's rank, or how many students out of a hundred had a lower score. NCE scores have a major advantage over percentiles in that they can be averaged. That is an important characteristic when studying overall school performance, and in particular, in measuring school-wide gains and losses in student achievement.
Stanine Scores • A stanine is a range from one through nine that shows the placement of a students overall ranking on a standardized test. • The low end of the stanine is 1,2 and 3. The high end of the range is 7, 8 and 9. In the middle are 4, 5 and 6.
Scale Scores • Scale Scores are transformed raw scores. For every possible raw score on a test form, there is a corresponding scale score, although a scale score may represent more than one raw score depending on the distribution of the results. When multiple forms of a test are used, or when results are compared from year to year, scale scores are needed to adjust for possible differences in test form length or difficulty. Scale scores provide a useful measurement tool for many assessment programs. They are used in numerous national testing programs, including the ACT and SAT examinations, which are typically part of the admissions process for colleges and universities. Scale scores are also routinely used in many other statewide testing programs, providing the basis for long-term, meaningful comparisons of student results across different test administrations. Arkansas Comprehensive Testing, Assessment, and Accountability Program
Item Response Theory (IRT) • IRT takes into account the difficulty of each item on the test and adjust the scores accordingly • Rasch Model (George Rasch, Danish Mathematician) • IRT are not normally used by educators • As in comparing apples to apples
Grade-Equivalents • Grade Equivalent score • Test has been given to large enough number of students that it is possible to estimate what the average (median) student will score on entering a certain grade in a certain month of instruction • Ex: 4.3 • Used frequently for interpretive purposes because they seem so “blessedly simple to understand”
Yield no clues as to percentile standing • A student might get a higher grade-equivalent score on a reading test than on a math test, yet have a lower percentile rank on the reading test than on the math. For Example:
Grade Equivalents Continued • A fourth-grade student who scores a grade equivalent of 7.5 is NOT indicative of a fourth-grader who is doing well in seventh grade math. Rather, it is to say that the fourth-grader has done as well as a seventh grader would have done on the fourth grade test. Misleading
Say What??? • Similarly, a fifth-grader who earns a math grade-equivalent of 2.5 does not mean that he or she is doing fifth-grade math work as well as a second grader would (because a second grader obviously isn’t given fifth grade math). • About the best you can say based on the test results are that the fifth-grader seems to be lagging several years behind grade level.
Take it with a grain of salt … • Many misguidedly interpret what is to what should be… thinking that all eighth graders should earn a grade equivalent score of 8.0 • In fact, the GES is the median, so by definition, ½ of examinees will fall below this score. • For these reasons, grade-equivalent scores should be used with considerable caution and plenty of disclaimers!
Norms • Typically, before developers of a new reading test publish it, they administer that test to a large number of students, who will from this point on be referred to as the norm group. Results are summarized in tabular form. • These norm tables are then used to identify the percentile equivalent for any student’s raw score. • This essentially makes it easier for the common person to make sense out of the scores.
Criteria for Judging Normative Data • Ingredients: • Sample size • Representativeness • Recency • Publishers generally update data about every 6 years • Descriptive Procedures • How the test will be run
“Circumscribing what’s to be measured” or… Delimiting that which can be quantified • In other words: • Teachers need to understand the way a particular curricular aim has been made operational (functional) by a test. • Busy teachers need access to descriptions of curricular aims that are both clear and concise. • (Judging by the title of this section, the author is clearly neither!)
Traditional “Tying-Down” Techniques • “Test-item specifications” • Refer explicitly to the rules for generating the items on a test • “Test Specifications” • A document that tells how many items of certain types should be on a test form
Practical Requirements for Classroom Tests • Will tests be hand-scored or machine-scored • How long will the test be? • How long are teachers willing to spend grading? • Essay • Scantron • Accommodations fo disabled students • Visual/auditorily impaired • Students with IEP’s • Test security • New forms each year • Retakes • Absences • Later class periods • Timed or not timed? • Resources needed to complete the exam • Dictionaries • Calculators • laptops
Assessment Options • Inexperienced assessors tend to gravitate towards tests with which they are familiar. • Multiple choice • True/False • Instead Try • Observation of student performance • Appraisal of student generated products • Portfolio collections of student-selected work • Short answer • Matching • Binary-choice • Essay • Best of all … Try a combination of these.
Test Delineation Alternatives • Two way grid • One of the most popular strategies of test development companies. • One dimension reflects the content and the other describes the cognitive behavior to be assessed. • Can be used as a check to see if there are too few or too many items of a particular kind.
Content or Skill Listings • A list of content categories along with the # of test items / category. • Can identify any disproportionate #’s of items per content standard • Simplified versions can be used by teachers
Instructional Role of Test-Delineation • Provide teachers with targets (educational outcomes) • Helps teachers orient instruction toward a curricular aim rather than aiming at a particular set of test items.
Assessment descriptions • Must communicate with clarity • Must be concise and well written so that busy teachers are willing to read them • Promote student mastery of curricular aims being assessed due to better understanding by teachers of their instructional goals
Opportunity or Obligation? If the descriptors for your state’s high stake test are insufficiently clear, push hard at the state level for clarity regarding what’s to be assessed.
Qualities of Assessment Description 2 Critical Components Assessment Descriptions • A delineation of the kinds of tasks that students may be asked to carry out • The evaluative procedures that will be used to judge the quality of students’ responses to whatever task or tasks they are given
Selected Item Response Do educational leaders need to know about how to construct test items? • YES! • Educational leaders are asked to work with teachers whose classroom assessments need help. • Only if the educational leaders have experience in assessment will other educators believe in their assessment competence.
Either / Or Selected Response Test items requiring students to choose their answer from 2 or more options Constructed Response Test items requiring students to generate their own responses Essay Short answer Extemporaneous speech Create a project (bookends, poster, etc) • True/False • Multiple Choice • Choice of magazine during free reading