Selecting Effective Early Reading Assessments

Selecting Effective Early Reading Assessments Natalie Rathvon, Ph.D.

What We’ll Cover • A research-based framework for selecting early reading assessments • Application of the framework to selected early reading instruments • Early reading assessment case examples • Resources for early reading assessment and intervention

So many tests, so few guidelines . . . • Growing number of print and online tests that claim to assess or predict reading • Standards for Psychological and Educational Testing (AERA, APA, & NCME, 1999) • Provides general guidelines--not specific criteria--for evaluating psychometric quality

Myths about Early Reading Assessments • All claims that a reading measure is “scientifically based” are equally valid. • A valid and reliable measure is equally valid and reliable for all examinees. • All measures of the same reading component yield similar results for the same examinee.

Does Tim (Grade 1) have a reading problem?

Why does this happen? • Tests vary in terms of their psychometric characteristics and soundness. • Early Reading Assessment: A Practitioner’s Handbook

Traditional “Standard battery” (one size fits all) Assumes reading problems arise from internal child deficits Designed to provide a categorical label for programming purposes Component-based Targets domains related to the identified deficits Assumes most reading problems arise from experiential and/or instructional deficits Designed to provide information for guiding instruction Early Reading Assessment Models

4 Cognitive-linguistic Variables Phonological processing Rapid naming Orthographic processing Oral language 6 Literacy Skills Print awareness Alphabet knowledge Single word reading Contextual reading Reading comprehension Written language 10 Key Reading Components

Considerations in Selecting Early Reading Assessments • Technical adequacy:Psychometric soundness • Usability: Degree to which practitioners can actually use a measure in applied settings

Five Key Technical Adequacy Characteristics • Norms • Test floors • Item gradients • Reliability • Validity

How can we examine a test’s technical characteristics? • Test manuals? Tremendous variation in quality and quantity of the psychometric information provided • WJ III: 2 examiner manuals, separate 209-page technical manual • Dyslexia Early Screening Test: 7 pages in 45-page manual • Research literature? • Continuing stream of validation data

Norms: How do we interpret performance? • Norm-referenced measures:Comparisons with age/grade peers • Criterion-referenced measures: Comparisons with pre-determined performance standards • Nonstandardized measures:Research norms or examiner judgment

Evaluating the Adequacy of Norms • Are they representative? • Criteria:Should match a national or appropriate reference population • Are they recent? • Criteria:No more than 7 – 12 years old • Are subgroup and sample sizes large enough? • Criteria:At least 100 (subgroup size) & 1000 (sample size)

Evaluating Norms, II • Are norm table intervals small enough to reflect small changes in skill development and small differences among examinees? • Criteria: • No more than 6 months for students aged 7-11 and younger • No more than 1 year for students aged 8-0 to 18

Norms example 1: Expressive Vocabulary Test (AGS, 1997) • Date = 1995-1996 (age norms only) • Total norm group = 2,725 examinees • 5-0 to 6-11 group = 119-122 examinees tested per each 6-month interval • Derived scores = 2-month increments • Derived scores for 5-0 to 6-11 age group are based on 39-56 examinees.

Norms example 2: TOWRE 8-year-old Grade 2 student

Reliability: Are scores consistent and accurate? • Alternate-form: Form A vs. Form B • Internal consistency: Item A vs Item B • Test-retest: Time A vs. Time B • Interscorer: Scorer A vs. Scorer B • Criteria: =/> .80 for screening measures; =/> .90 for diagnostic measures

Hidden Threat to Reliability • Examiner variance: Differences among assessors in administering tasks and recording responses • Especially likely on: • Live-voice tasks (phoneme blending) • Fluency-based tasks (rapid naming) • Tasks with complex administration or scoring systems (LAC–3)

Reliability Example: TOWRE (PRO-ED, 1999) • Internal consistency = .93 and above • Alternate form = .90 and above • Test-retest = .90 and above for a study with examinees ages 6-9 (n = 29) • Interscorer = .99, based on agreement of 2 independent scorers with 30 completed protocols

Test Floors: Can the Test Detect Poor Readers? • Test floor:Lowest possible standard score when a student answers 1 item correctly • Adequate floors: Permit identification of students with very weak skills • Inadequate floors: Overestimate students’ level of skills

Test Floor Criteria • A subtest raw score of 1 should yield a standard score greater than 2 standard deviations below the subtest mean. • SS of 3 or less for a subtest mean of 10 • SS of 69 or less for a subtest mean of 100

Which Tests Are Likely to Display Floor Effects? • “Cradle-to-grave” tests • Phonemic manipulation tasks (deletion, substitution, reversal) • Oral reading fluency tests • Pseudoword reading tests • Spelling tests • Reading comprehension tests

Item Gradients: Can the Test Detect Small Differences? • Item gradient: Steepness with which standard scores change from 1 raw score unit to another • Adequate gradient: Sensitive to small differences in performance • Steep gradient: Obscures differences among performance levels

Item Gradient Criteria • 6 or more items between subtest floor and mean (M = 10) or • 10 or more items between subtest floor and mean (M = 100) • Caution: Item gradients should be evaluated in the context of test floors.

Test Floors and Item Gradients: Special Cases • Screening tests • Critical issue is cutoff score accuracy, not floor/gradient violations • Tests not yielding standard scores • Deciles, percentiles, quartiles, stanines • Rasch-model tests • Preclude direct inspection of raw score-standard score relationships • WJ family: WJ III, WRMT-R/NU, WDRB

Floor & Gradient Example: GORT-4 (PRO-ED, 2001) • Item gradients = adequate • Floors • Rate = inadequate below 8-0 for both forms • Accuracy = inadequate below 7-6 for Form A and below 8-0 for Form B • Comprehension = inadequate below 8-0 for Form A and below 9-0 for Form B • ORQ = inadequate below 6-6 for Form A and below 7-6 for Form B

Validity: Are the Results Meaningful? • Content validity: Effectiveness in assessing the relevant domain • Criterion-related validity:Effectiveness in predicting performance now (concurrent validity) or later (predictive validity) • Construct: Effectiveness in measuring what the test is supposed to measure • Criteria: Evidence of all three types of validity for the target population

Validity Example: WJ III ACH • Content validity: remarkably little content validity evidence • Criterion-related validity:correlates .63 to .82 with WIAT • WJ III Written Expression mean standard scores = more than 10 points higher than WIAT Written Expression mean standard scores

WJ III ACH Validity Example, Cont. • Diagnostic utility = study with 48 students with ADHD ages 6 – 17 • ADHD group scored significantly lower than norm group on 3 of 8 WJ III ACH tests (Oral Comprehension, Passage Comprehension and Calculation)

The Untold Story: Usability Considerations • Usability often has more influence in test selection and use than technical adequacy. • Virtually no research on impact of usability on test selection and use

Do these comments sound familiar? • “I know how to give it.” • “It doesn’t take long to give.” • “It’s easy to carry around.” • “I think I saw one in the storage closet.” • “I think that test kit has all the parts.”

Key Practical Characteristics • Test construction • Administration • Accommodations and adaptations • Scores and scoring • Interpretation • Links to intervention

Usability Example: DEST (PsyCorp, 1996) • Inexpensive ($130.00) • Has numerous stimulus materials to manage, increasing administration time • Letter Naming subtest: 4 cards for 12 items • Digit Naming subtest: 3 cards for 9 items • Requires calibrating a postural stability balance tester • Manual is not spiral bound, so it doesn’t lie flat during administration.

Increasing the Effectiveness of Early Reading Assessments • Begin with measures that target domains directly related to the referral problem. • Supplement norm-referenced measures with criterion-referenced measures to ensure adequate coverage and increase instructionally relevant information. • Know the psychometric strengths and limitations of each measure you use.

Increasing Effectiveness, II • Evaluate the presence of attentional, behavior, and motivational problems. • Key predictors of response to intervention • The Unmotivated Child • Assess environmental and instructional variables.

Instructional Disability?

The Golden Rule of Assessment • The best designed assessment with the most reliable and valid measures administered by the best trained examiner won’t change a child’s reading trajectory . . . unless someone in the child’s life does something different. Effective School Interventions: Strategies for Enhancing Academic Achievement and Social Competence

Early Reading Assessment and Intervention Resources AERA, APA, & NCME. (1999). Standards for educational and psychological testing. Washington DC: AERA. www.apa.org Buros Institute of Mental Measurements. www.unl.edu/buros Center for Equity and Excellence in Education Test Database. http://ceee.gwu.edu/standards_assessments/sa.htm ERIC Clearinghouse on Assessment. http://www.ericae.net Florida Reading Research Center. http://www.fcrr.org

More Resources • Rathvon, N. (2004). Early Reading Assessment: A Practitioner’s Handbook. New York: Guilford. www.guilford.com • Rathvon, N. (1999). Effective School Interventions: Strategies for Enhancing Achievement and Social Competence. New York: Guilford. www.guilford.com • Rathvon, N. (1996). The Unmotivated Child: How to Help Your Underachiever Become a Successful Student. New York: Simon & Schuster. www.simonsays.com • Southern Educational Development Laboratory. www.sedl.org/reading/rad

Thank you!

Selecting Effective Early Reading Assessments

Selecting Effective Early Reading Assessments

Presentation Transcript

English EOC Assessments Reading

Early Reading

USING EARLY LITERACY ASSESSMENTS TO PREDICT READING ACHIEVEMENT

Writing Effective Self Assessments

Early Reading

Early Reading Assessments

Early Reading

Early Reading Skills

Early reading

Early Reading Interventions

Early Reading Interventions

Early Reading First

Diagnostic Reading Assessments

EARLY READING CENTER

READING ASSESSMENTS

Teaching Early Reading

EFFECTIVE READING STRATGIES

Effective reading strategies

USING EARLY LITERACY ASSESSMENTS TO PREDICT READING ACHIEVEMENT

Effective Reading