430 likes | 565 Views
So Much Data, So Little Time: What Parents and Teachers Need to Know About Interpreting Test Results. Lee Ann R. Sharman, M.S. ORBIDA Lecture Series April 13, 2010. You’re Not Paying Attention!. If You Aren’t Confused…. Dibels. SWIS. Response to Intervention.
E N D
So Much Data, So Little Time: What Parents and Teachers Need to Know About Interpreting Test Results Lee Ann R. Sharman, M.S. ORBIDA Lecture Series April 13, 2010
You’re Not Paying Attention! If You Aren’t Confused…
Dibels SWIS Response to Intervention Patterns of Strengths and Weaknesses Severe Discrepancy AYP OAKS
Working Smarter: Learning Goals • By the end of our session, you will be understand these key terms/concepts: • Different strokes for different folks: all tests are not equal! • Basic statistics you MUST know • Reliability and validity • The Bell Curve – a beautiful thing • Error, and why it matters • Common mistakes that lead to poor decisions
Why Test? • Entitlement decisions (eligibility) • Skills assessment (diagnostic) • Screening and Progress Monitoring (RtI) • Instructional planning, accommodations and modifications • Curriculum evaluation – is it working? • Increased focus on data-based decision making to measure outcomes
The Assessment “Toolbox” Working smarter: ask the right questions! • Understand the differences between types of tests and what they were designed to measure: • Curriculum-based measures (Dibels, Aimsweb) • Teacher-made criterion referenced tests • Published criterion referenced tests • Norm-referenced tests of Achievement • OAKS • Woodcock-Johnson III • Norm-references tests of Cognitive Ability The test you choose depends on what questions you want to answer.
Other Assessment Procedures • School records (file reviews) • Interviews • Medical and Developmental histories • Error analyses • Use of portfolios • Observations
Using Your Assessment “Tools” -Two Main Applications • The Snapshot: Point in Time Performance • Measuring Improvement (Change) and Growth • …You can use a hammer to push in a screw, but a screwdriver will be easier and more efficient
For Example, Let’s Talk About OAKS… • What OAKS is: • OAKS is a “Point in Time” measurement, intended to be used more as a Summative Assessment. It’s a SNAPSHOT. • Gives information on group achievement towards state standards to stakeholders; “Are enough students in our district meeting benchmarks?
What OAKS is NOT: • OAKS is not intended to give information (see OARs) that will inform instruction or interventions • A tool designed for Progress Monitoring • A measure of aptitude or ability • A comprehensive measure of identified content
New Developments in Assessment • Response to Intervention – RtI • All models involve tiers of interventions, progress monitoring, and cut scores to determine who is a “responder” (or not). • Dibels is a commonly used tool for progress monitoring
New Developments, Cont’d.:Patterns of Strengths and Weaknesses (PSW) • A few different models, but in this case we refer to measurement of the cognitive abilities underlying areas of unexpected low academic achievement • Specific cognitive abilities (processing measures, e.g. Rapid Automatic Naming, Phonemic Awareness, Long-Term Retrieval) predict reading, writing, and math acquisition ability
Questions to Ask: the Cheat Sheet 1. WHAT KIND OF TEST IS THIS? e.g. Norm or Criterion-referenced 2. What is it used for - the purpose? 3. Is it valid for the stated purpose (what it measures or doesn’t measure)? 4. Is the person administering the test a qualified administrator? 5. Are the results valid (test conditions optimal, etc.)
A “Comprehensive” Evaluation • Parental permission…true informed consent • Screening for sensory impairments or physical problems • File review of school records • Parent/caregiver interview • Documented interventions and quality instruction • Intellectual and academic assessment • Behavioral assessment or observation • Summary and recommendations
What Every Good Report Contains I. Identifying data – the Who, What, When II. Background Information • Student history • Reason for Referral • Classroom Observation • Parent Information • Instruction received/strategies implemented
A Good Report, cont’d • Test Results • Test Interpretation • Summary and Conclusions • Summary • Recommendations for instruction • Recommendations for further assessment, if needed
Assessment is rocket science! • Must haves: • Skilled examiner • Optimal test conditions • Cultural bias – be aware • Validity/reliability • Appropriate measures for goal
What the Skilled Assessor Knows Kids are more than the scores – the “rule outs”: • Home/Environmental issues • Sensory acuity problems • Previous educational history • Language factors • Second language and/or language disorders • Social/Emotional/Behavioral issues
The Skilled Assessor, Cont’d • The Matthew Effect – poor reading skills depress IQ scores (Stanovich)…”The rich get richer” • The Flynn Effect – IQ is increasing in the population over time; tests are renormed to reflect this phenomenon
Know What the Numbers Mean • The devil is in those details…learn the basic principles of Statistics
Statistics?!$%&^%! Simply stated: • Statistics are used to measure things and describe relationships between things, using numbers
Some Basics: The Big Four • Standard Scores (SS) and Scaled Scores (ss) • Percentile Ranks (% rank) • Age and Grade Equivalents (AE/GE) • Relative Proficiency Index (RPI)
And Let’s Not Forget… The Bell Curve OR, The Normal Frequency Distribution
Test Results: Look For These • Mean and standard deviation of the test used reported • Standard scores, percentile ranks, and standard errors of measures, with explanations of each • Both composite or broad scores and subtest scores, with an explanation of each
Which Should Lead To… • Information aboutdevelopmental ceilings, functional levels, skill sequences, and instructional needs upon which assessment/curriculum linkages can be used to write the IEP goals
Standard Scores (SS) • These are raw scores which have been transformed to have a given mean (average) and standard deviation (set range or unit of scores). The student’s test score is compared to that average. A standard score expresses how far a student’s score lies above or below the average of the total distribution of scores.
Standard Score Out Raw Score In
Percentile Ranks (PR) • Similar to SS, but in a different form. Allows us to determine a student’s position (relative ranking) compared to the standardized sample • Percentile rank is NOT the same as a percent score! PR refers to a percentage of persons; PC refers to a percentage of test items correct.
Relative Proficiency Index (RPI) • Valuable statistic, found only on WJ-III • Written as a percentage, or number out of 90, indicating percent of proficiency on similar tasks that students in the comparison group would have 90% success. Correlated with Independent, Instructional, and Frustration levels (see sample)
Common Errors… • Making faulty comparisons: Compare only data sets measuring the same content, with good content/construct validity, that are NORMED ON THE SAME POPULATION • Using and AE/GE as a measure of the child’s proficiency/skill mastery of grade level material • Error exists! Don’t forget about the confidence intervals • SEM creates uncertainty around reporting 1 number • Confusing Percentile RANKS with Percentages: • PR = relative ranking out of 100 • Percentage = percentage correct
Age and Grade Equivalents • Age equivalents are developed by figuring out what the average test score is (the mean) for a group of children of a certain age taking the test; not the same as skills • Grade equivalents are developed by figuring out what the average test score is (the mean) for a student in each grade.
Age and Grade Equivalents • Commonly used • Misleading • Misunderstood • Difficult to explain • May have little relevance • Avoid in favor of Standard Scores/%Ranks
Percentage vs. Percentile • “When assessed with teacher made tests, Sally locates information within the text with 60% accuracy.” VS. • “Sally’s performance on the OLSAT falls at the 60th %ile rank.”
Broad Scores Can Be Deceptive! • Are the student’s skills better developed in one part of a domain than another? For example: “While Susan’s Broad Math score was within the low average range, she performed at a average level on a subtest that assesses problem solving, but scored well below average on a subtest that assesses basic math calculation.
What If… • Test scores don’t support the teacher report of a weakness? • First, look at differences in task demands of the testing situation, and in the classroom, when hypothesizing a reason for the difference. • Look at student’s Proficiency score (RPI) vs. Standard Score (SS)
“Although weaknesses in mathematics were noted as a concern by Billy’s teacher, Billy scored in the average range on assessments of math skills. These tests required Billy to perform calculations and to solve word problems that were read aloud to him. It was noted he often paused for 10 seconds or more before starting paper and pencil tasks in mathematics.”
“Billy’s teacher stated that he does well in spelling. However, he scored well below average on a subtest of spelling skills. Billy appeared to be bored while taking the spelling test, so a lack of vigilance in his effort may have depressed his score. Also, the school spelling tests use words he has been practicing for a week.
The lower score may indicate that he is maintaining the correct spelling of words in long-term memory, and is not able to correctly encode new words he has not had time to study.
Remember the Cheat Sheet 1. WHAT KIND OF TEST IS THIS? e.g. Norm or Criterion-referenced 2. What is it used for - the purpose? 3. Is it valid for the stated purpose (what it measures or doesn’t measure)? 4. Is the person administering the test a qualified administrator? 5. Are the results valid (test conditions optimal, etc.)
Times are Changing… • The good news: We are moving away from the old “Test and Place” mentality. • The challenge: School teams are using more comprehensive data sets, which require more knowledge to interpret • More good news: The best decisions are made using multiple sources of good information
Food For Thought • “The true utility of assessment is the extent to which it enables us to find the match between the student and an intervention that is effective in getting him or her on track to reach a meaningful and important goal. The true validity of any assessment took should be evaluated by the impact it has on student outcomes.” (Cummings/McKenna 2007)
A Memorable Quote • “…one of the problems of writing about intelligence is how to remind readers often enough how little an IQ score tells you about whether or not the human being next to you is someone whom you will admire or cherish.” (Herrnstein and Murray)