Interpreting Test Scores: Making Sense of the Numbers

Interpreting Test Scores:Making Sense of the Numbers Anjanette Pelletier, Program Specialist Karol Wright, Resource Specialist Newark Unified School District

Test Scores – What do they Mean? • Experience tells us some things: • – Teachers, parents and even psychologists think they know and understand simple scores like total scores, grade equivalents, percent correct and perhaps percentile ranks…when they don’t. • – Teachers and parents rarely claim they know and understand how to interpret scaled or standard scores…but they need to. • – Most assessment systems use scaled or standard scores and training around the interpretation of such scores is available. • – Interpretation of scaled and standard scores can support the development of appropriate IEP goals and objectives.

What is Average?

Normal Distribution

Bell Curve Average is within one standard deviation from the mean. For normal distribution this accounts for about 68% of the set (dark blue), while two standard deviations (medium and dark blue) account for 95% of the set and three standard deviations (light, medium and dark blue) account for about 99.7%

Standard Scores and Scaled Scores • Measure how far the student scored from the average in terms of the average spread of scores for the whole group. A standard score of 115 or scaled score of 13 means the student scored one standard deviation above the average (which would be the 84th percentile rank). A standard score of 85 or scaled score of 7 means the student scored one standard deviation below the average (which would be a percentile rank of 16). • Equal units: can be added, subtracted, multiplied, divided, or averaged.

What are Scaled Scores? • A scaled score is a conversion of a student's raw score on a test to a common scale that allows for a numerical comparison between students. Because most major testing programs use multiple versions of a test, the scale is used to control slight variations from one version of a test to the next. • Scaled scores are particularly useful for comparing test scores over time, such as measuring year-to-year growth of individual students in a content area.

What are Percentile Scores? • Percentiles give a more detailed description of how children compare with other students who took the test by showing scores that range from 1 to 99. • For example, if a student scored in the 66th percentile on a test, that student achieved a score that is higher than 66% of the other students who took the test. • Do not confuse percentile scores with percentage correct scores. Percentile scores allow you to compare one student's scores with a group of students who took the test. Percentagecorrect scores simply reveal the number of items that a student answered correctly out of the total number of items.

Percentile Ranks • The percent of students whose scores were tied or beaten by this student. If your child scores in the 37th percentile rank means your student scored as high as or higher than 37 percent of the students in the test's norming sample. The 99th percentile means you were in the highest one percent of the group. If 1000 students took the test then your student performed the same as or better than 370 or 990 of the students, respectively. • Nothing to do with percent correct. (Never use % sign in an abbreviation!) • Not equal units: cannot be added, subtracted, multiplied, divided, nor averaged.

Grade-Equivalent Scores • Grade-level equivalent scores are determined by giving a test that is developed for a particular grade to students in other grades. For instance, test designers establish grade-equivalents for a 4th grade test by giving that same test to students who are in the 6th and the 2ndgrades. • Grade-level equivalent scores are often misunderstood; be careful when you interpret them. If a 4th grader received a 7thgrade equivalent score on a 4th grade reading achievement test, the parents may believe their child is ready for 7th grade material. Actually, the score means that the child reads 4th grade material as well as theaverage 7th grader. Conversely, a 2nd grade score equivalent means that the child reads the 4th grade material as well as the average 2nd grader.

Grade-Equivalent Scores • Not equal units: cannot be added, subtracted, multiplied, divided, nor averaged. • Do not reflect the student's actual functioning level • May not be real scores at all • May not even be grade levels included in the test. • Example: A student cannot be said to have made growth if....

Age vs. Grade Based Scores • The usual meaning that parents and teachers come away with when grade/age equivalents are used is that the child has the skills approximately of the age/grade specified - which in not at all the case. The age/grade equivalent is simply that the child obtained the raw score corresponding to the average raw score obtained by a group of children at the specified age/grade. • Most tests lack a sufficient number of items at any specific age/grade level of difficulty due to their broad age/grade range of the test.

Age vs. Grade Based Scores • Grade equivalents are simple transformations of raw scores, a student may in fact do no work at all on the grade level reported. The child might do unusually accurate work below the reported level, or the opposite. • Variability in scores is not standard throughout the range of the scale, so a "two year deficit" is not as severe at some ages as it may be at others. • Grade- and age-equivalents appear to be more sample-sensitive than standard scores; that is, there appears to be more variability between tests in grade- and age-equivalents, and more consistency when comparing standard scores between tests.

Age vs. Grade Based Scores • º300.541 Criteria for determining the existence of a specific learning disability. • (a) A team may determine that a child has a specific learning disability if* • (1) The child does not achieve commensurate with his or her age and ability levels in one or more of the areas listed in paragraph (a)(2) of this section, if provided with learning experiences appropriate for the child's age and ability levels; and • (2) The team finds that a child has a severe discrepancy between achievement and intellectual ability in one or more of the following areas: Notice the criteria is "does not achieve commensurate with his or her age and ability" not "grade and ability."

Reasons for Caution: Reading Comprehension Score Inconsistency • Among the various influences dictating the stratum of reading comprehension difficulty, all of which must be fully taken into account for a comprehensive assessment, are sentence length, word length (measured in syllables), and the grammatical or syntactic complexity and structure of the sentences. 41 words, 41 words/sentence, 5.8 letters/word: Flesch-Kincaid grade 12.0

Reasons for Caution: Reading Comprehension Score Inconsistency • Many things make reading harder or easier. We must look at all of them. One is how long the sentences are. Long sentences are harder to read. Another is how long the words are. Long words have more syllables. They are hard to read. A third thing is the grammar. Complicated grammar is hard to read. 56 words, 6.2 words/sentence, 4.1 letters/word: Flesch-Kincaid grade 2.0

Reasons for Caution: Reading Comprehension Score Inconsistency • Good job, John! You have done it again. How do you come up with these ideas? 16 words, 5.3 words/sentence, 3.5 letters/word: Flesch-Kincaid grade 0.0 • Excellent job, John! You have done it again. How do you come up with these ideas? 16 words, 5.3 words/sentence, 3.8 letters/word: Flesch-Kincaid grade 1.2 • Excellent job, John! 3 words, 3.0 words/sentence, 5.3 letters/word: Flesch-Kincaid grade 5.2

Stanines • Almost equal units: can be added, subtracted, multiplied, divided, or averaged. • Fairly easy to explain and understand • Test scores are scaled to stanine scores using the following algorithm: • Rank results from lowest to highest • Give the lowest 4% a stanine of 1, the next 7% a stanine of 2, etc.

Stanines

Let’s Review • PERCENTILE RANKS state the percent of persons in the norming sample who scored the same as or lower than the student. A percentile rank of 50 would be Average – as high as or higher than 50% and lower than the other 50% of the norming sample. The middle half of scores falls between percentile ranks of 25 and 75. • STANINES (standard nines) are a nine-point scoring system. Stanines 4, 5, and 6 are approximately the middle half of scores, or average range. Stanines 1, 2, and 3 are approximately the lowest one fourth. Stanines 7, 8, and 9 are approximately the highest one fourth.

Let’s Review • STANDARD SCORES have an average (mean) of 100 and a standard deviation of 15. A standard score of 100 would also be at the 50th percentile rank. The middle half of these standard scores falls between 90 and 110. • SCALED SCORES have an average (mean) of 10 and a standard deviation of 3. A scaled score of 10 would also be at the 50th percentile rank. The middle half of these standard scores falls between 8 and 12. • Avoid use of Grade Equivalent and Age Equivalent Scores.

Types of Scoring • Norm-Referencing: This scoring technique shows a student's results in comparison to a "norm" group of students at the same grade level. The number of correct answers to test questions is placed in comparison to the norm (or average) and reported as a percentile. For example, if your child scored in the 38th percentile on her reading test, she would be ranked 38th out of a national norm of 100 students. • Most national achievement tests use norm-referenced scoring, including the California Achievement Test (CAT). Aptitude tests such as the SAT and "IQ" tests also are norm-referenced.

Types of Scoring • Criterion-Referencing: This scoring technique shows a student's results in comparison to a benchmark or set standard of acceptable performance. The test maker sets a level of proficiency and scores the student in relation to that level. Many achievement tests created and administered at the state level use criterion-referenced scoring, such as the California Standards Test (CST/STAR).

Definitions • Achievement Test: A standardized test that measures content-area knowledge (e.g., science, math, English, and social studies) and academic skills. • Criterion-Referencing: A scoring technique that shows a student's results in comparison to a benchmark or set standard of acceptable performance. • Norm-Referencing: A scoring technique that shows a student's results in comparison to a "norm" group of students. The norm group typically answers one half of all questions correctly. • Standards: Content and performance descriptions of what students should know at each grade level and in each subject.

Achievement Tests • Woodcock-Johnson Achievement Test (WJ-III) and the Wechsler Individual Achievement Test (WIAT-II) are two of the most commonly used ability and aptitude –cognitive and academic - assessments for students with disabilities. • Others include Brigance, KeyMath, CIBS, KTEA, etc.

Strengths of WJ-III • Provides a comprehensive system for measuring general intellectual ability, specific cognitive abilities, scholastic aptitude, oral language, and academic achievement. • Diagnose learning disabilities • Determine discrepancies • Plan educational programs • Plan individual programs • Assess growth • Provide guidance in educational settings

Strengths of WIAT-II • Comprehensive measurement tool useful for achievement skills assessment, learning disability diagnosis, special education placement, curriculum planning, and clinical appraisal for preschool children through adults. • Guidance for intervention and IEP planning. • Identify at-risk students as part of analysis for IDEA requirements. • Linked with the Wechsler intelligence scales. Provides valid discrepancy scores to help make meaningful comparisons between achievement and ability and plan appropriately.

Intelligence Tests • Required for eligibility for most categories. • Testing for cognitive processing shortchanges teams in two ways: 1) we can derive a falsely high or low scores and 2) we do not learn in what ways learning is being interfered with. • Typical assessment includes fluid intelligence (abstract thinking, problem solving), verbal intelligence, and non-verbal intelligence. Unfortunately, it is often the lower level cognitive processing that causes problems for our students.

Processing Skills to Note • Long-term retrieval - The ability to retrieve information on demand. • Short-term memory - The ability to hold information in one's immediate awareness long enough to think about it. • Working memory - The ability to remember information long enough to think about it and use the information to solve a problem.

Processing Skills to Note • Phonological awareness - How well one understands that words are made up of sounds. • Orthographic ability - How well one perceives and retains visual letter patterns. • Fine motor ability - The ability to rapidly perform fine motor tasks, such as handwriting. • Processing speed or automaticity - How rapidly and automatically one can perform simple tasks (affects routine abilities like sight word knowledge and math facts).

Formulating Goals • First, diagnose what the problem is. Because skills and understanding are built upon one another as a student learns, one weak skill or misunderstanding can have significant interference in a student’s progress. • Use of Grade Level STANDARDS is very appropriate here. Identify grade level skills and then identify the components of that skill that the child needs to develop or can achieve. • Realize that many students may need to work in STEPS to achieve grade level skills.

Formulating Goals • Look for multiple indicators to confirm and support strengths and weaknesses. Discuss available resources that can help pinpoint these areas, then work with the IEP team to write goals to improve learning. • Work samples, non-standardized tests, curriculum based measures and observation can be just as valid here as standardized test scores.

Formulating Goals • It is important to note that a single test score is not a perfect indicator of ability. Therefore, don't emphasize the significance of any one test result. There are numerous other factors to consider – illness, attitude, motivation, classroom environment, social relationships, etc. One or many of these may be affecting a student's performance. • Support the team to identify motivation strategies and supports for student achievement.

Interpreting Test Scores: Making Sense of the Numbers