Effective Assessment Practices in Educational Contexts

Chapter 1 Assessment in Social and Educational Contexts (Salvia, Ysseldyke & Bolt, 2012) Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment Winter 2013 Chapters 1, 11, 12, 13, and 14 are included in this presentation

AGENDA – Week 3 • Questions for the Good of the Group • Instruction and Lab Time: Continue WJ-III • Break • Group activity to process Chapters 1, 3, 11, 12, and 14 • Powerpoint overview of Chapters 1, 3, 11, 12, and 14

Individualized Support • Schools must provide support as a function of individual student need • To what extent is the current level of instruction working? • How much instruction is needed? • What kind of instruction is needed? • Are additional supports necessary?

Assessment Defined • Assessment is the process of collecting information (data) for the purpose of making decisions about students • E.g. what to teach, how to teach, whether the student is eligible for special services

How Are Assessment Data Collected? • Assessment extends beyond testing and may include: • Record review • Observations • Tests • Professional judgments • Recollections

Why Care About Assessment? • A direct link exists between assessment and the decisions that we make. Sometimes these decisions are markedly important. • Thus, the procedures for gathering data are of interest to many people – and rightfully so. • Why might students, parents, and teachers care? • The general public? • Certification boards?

Common Themes Moving Forward • Not all tests are created equal • Differences in content, reliability, validity, and utility • Assessment practices are dynamic • Changes in the political, technological, and cultural landscape drive a continuous process of revision

Common Themes Moving Forward • The importance of assessment in education • Educators are faced with difficult decisions • Effective decision-making will require knowledge of effective assessment • Assessment can be intimidating, but significant improvements have happened and continue to happen • More confidence in the technical adequacy of instruments • Improvements in the utility and relevance of assessment practices • MTSS framework

Chapter 11 Assessment of Academic Achievement with Multiple-Skill Devices

Achievement Tests • Achievement Tests • Norm-referenced • Allow for comparisons between students • Criterion-referenced • Allow for comparisons between individual students and a skill benchmark. • Why do we use achievement tests? • Assist teachers in determining skills students do and do not have • Inform instruction • Academic screening • Progress evaluation

Classifying Achievement Tests Diagnostic Achievement Number of students who can be tested • High • High • Low • Low More efficient administration – Comparisons between students can be made but very little power in determining strengths and weaknesses Less efficient administration – Allows for more qualitative information about the student. Less efficient administration – Dense content and numerous items allow teachers to uncover specific strengths and weaknesses Efficient administration – Typically only quantitative data are available

Considerations for Selecting a Test • Four Factors • Content validity • What the test actually measures should match its intended use • Stimulus-response modes • Students should not be hindered by the manner of test administration or required response • Standards used in state • Relevant norms • Does the student population being assessed match the population from which the normative data were acquired?

Tests of Academic Achievement • Peabody Individual Achievement Test (PIAT-R/NU) • Wide Range Achievement Test 4 (WRAT4) • Wechsler Individual Achievement Test 3 (WIAT-III)

Peabody Individual Achievement Test-Revised/Normative Update (PIAT-R/NU) • In general… • Individually administered; norm-referenced for K-12 students • Norm population • Most recent update was completed in 1998 • Representative of each grade level • No changes to test structure

PIAT-R/NU

PIAT-R/NU • Scores • For all but one subtest (written expression), response to each item is pass/fail • Raw scores converted into: • Standard scores • Percentile ranks • Normal curve equivalents • Stanines • 3 composite scores • Total reading • Total test • Written language

PIAT-R/NU • Reliability and Validity • Despite new norms, reliability and validity data are only available for the original PIAT-R (1989) • Previous reliability and validity data are likely outdated • Outdated tests may not be relevant in the current educational context

Wide Range Achievement Test 4 (WRAT4) • In general… • Individually administered • 15-45 minute test length depending on age (5-94 age range) • Norm-referenced, but covers a limited sample of behaviors in 4 content areas • Norm population • Stratified across age, gender, ethnicity, geographic region, and parental education

WRAT4 • Scores • Raw scores converted to: • Standard scores, confidence intervals, percentiles, grade equivalents, and stanines • Reading composite available • Reliability • Internal consistency and alternate-form data are sufficient for screening purposes • Validity • Performance increases with age • WRAT4 is linked to other tests that have since been updated; additional evidence is necessary

Wechsler Individual Achievement Test- Third Edition (WIAT-III) • General • Diagnostic, norm-referenced achievement test • Reading, mathematics, written expression, listening, and speaking • Ages 4-19 • Norm Population • Stratified sampling was used to sample within several common demographic variables: • Pre K – 12, age, race/ethnicity, sex, parent education, geographic region

WIAT-III • Subtests and scores • 16 subtests arranged into 7 domain composite scores and one total achievement score (structure provided on next slide) • Raw scores converted to: • Standard scores, percentile ranks, normal curve equivalents, stanines, age and grade equivalents, and growth scale value scores.

WIAT-III Subtests

WIAT-III • Reliability • Adequate reliability evidence • Split-half • Test-retest • Interrater agreement • Validity • Adequate validity evidence • Content • Construct • Criterion • Clinical Utility • Stronger reliability and validity evidence increase the relevance of information derived from the WIAT-III

Getting the Most Out of an Achievement Test • Helpful but not sufficient – most tests allow teachers to find an appropriate starting point • What is the nature of the behaviors being sampled by the test? • Need to seek out additional information concerning student strengths and weaknesses • Which items did the student excel on? Which did he or she struggle with? • Were there patterns of responding?

Chapter Twelve Using Diagnostic Reading Tests

Why Do We Assess Reading? • Reading is fundamental to success in our society, and therefore reading skill development should be closely monitored • Diagnostic tests can help to plan appropriate intervention • Diagnostic tests an help determine a student’s continuing need for special services

The Ways in Which Reading is Taught • The effectiveness of different approaches is heavily debated • Whole-word vs. code-based approaches • Over time, research has supported the importance of phonemic awareness and phonics

Skills Assessed by Diagnostic Approaches • Oral Reading • Rate of Reading • Oral Reading Errors • Teacher pronunciation/aid • Hesitation • Gross mispronunciation • Partial mispronunciation • Omission of a word • Insertion • Substitution • Repitition • Inversion

Skills Assessed by Diagnostic Approaches (cont.) • Reading Comprehension • Literal comprehension • Inferential comprehension • Critical comprehension • Affective comprehension • Lexical comprehension

Skills Assessed by Diagnostic Approaches (cont.) • Word-Attack Skills (i.e., word analysis skills) – use of letter-sound correspondence and sound blending to identify words • Word Recognition Skills – “sight vocabulary”

Diagnostic Reading Tests • See Table 12.1 • Group Reading Assessment and Diagnostic Evaluation (GRADE) • DIBELS Next • Test of Phonemic Awareness – 2 Plus (TOPA 2+)

GRADE (Williams, 2001) • Pre-school to 12th grade • 60 to 90 minutes • Assesses pre-reading, reading readiness, vocabulary, comprehension, and oral language • Missing some important demographic information for norm group, high total reliabilities (lower subscale reliabilities), adequate information to support validity of total score.

DIBELS Next (Good and Kaminski, 2010) • Kindergarten-6th grade • Very brief administration (used for screening and monitoring) • First Sound Fluency, Letter Naming Fluency, Phoneme Segmentation Fluency, Nonsense Word Fluency, Oral Reading Fluency, and DAZE (comprehension) • Use of benchmark expectations or development of local norms • Multiple administrations necessary for making important decisions

TOPA 2+ (Torgesen & Bryant, 2004) • Ages 5 to 8 • Phonemic awareness and letter-sound correspondence • Good norms description • Reliability better for kindergarteners than for more advanced students • Adequate overall validity

Chapter 13 Using Diagnostic Mathematics Measures

Why Do We Assess Mathematics? • Multiple-skill assessments provide broad levels of information, but lack specificity when compared to diagnostic assessments • More intensive assessment of mathematics helps educators: • Assess the extent to which current instruction is working • Plan individualized instruction • Make informed eligibility decisions

Ways to Teach Mathematics 1980s: Constructivist approach – standards-based math. Students construct knowledge with little or no help from teachers 1960s: New Math; movement away from traditional approaches to mathematics instruction < 1960: Emphasis on basic facts and algorithms, deductive reasoning, and proofs > 2000: Evidence supports explicit and systematic instruction (most similar to “traditional” approaches to instruction).

Behaviors Sampled by Diagnostic Mathematics Tests • National Council of Teachers of Mathematics (NCTM) • Content Standards • Number and operations • Algebra • Geometry • Measurement • Data analysis and probability • Process Standards • Problem solving • Reasoning and proof • Communication • Connections • Representation

Specific Diagnostic Math Tests • Group Mathematics Assessment and Diagnostic Evaluation (G●MADE) • KeyMath-3 Diagnostic Assessment (KeyMath-3 DA)

G●MADE • General • Group administered, norm-referenced, standards-based test • Used to identify specific math skill strengths and weaknesses • Students K-12 • 9 levels of difficulty teachers may select from

G●MADE • Subtests • Concepts and communication • Language, vocabulary, and representations of math • Operations and computation • Addition, subtraction, multiplication, and division • Process and applications • Applying appropriate operations and computations to solve word problems

G●MADE • Scores • Raw scores converted to: • Standard scores, grade scores, stanines, percentiles, and normal curve equivalents, and growth scale values. • Norm population • 2002 and 2003; nearly 28,000 students • Selected based on geographic region, community type, socioeconomic status, students with disabilities

G●MADE • Reliability • Acceptable levels of split-half and alternative form reliability • Validity • Based on NCTM standards (content validity) • Strong criterion related evidence

KeyMath-3 Diagnostic Assessment (KeyMath-3 DA) • General • Comprehensive assessment of math skills and concepts • Untimed, individually administered, norm-referenced test; 30-40 minutes • 4 years 6 months through 21 years

KeyMath-3 DA Subtests • Numeration • Algebra • Geometry • Measurement • Data analysis and probability • Mental computation and estimation • Addition and subtraction • Multiplication and division • Foundations of problem solving • Applied problem solving

KeyMath-3 DA • Scores • Raw scores converted to: • Standard scores, scaled scores, percentile rank, grade and age equivalents, growth scale values • Composite scores • Operations, basic concepts, and application • Norm population • 3,630 individuals • 4, 6, and 21 years – demographic distribution approximates data reported in 2004 census

KeyMath-3 DA • Reliability • Internal consistency, alternate-form, and test-retest reliability • Adequate for screening and diagnostic purposes • Validity • Adequate content and criterion-related validity evidence for all composite scores

Chapter 14 Using Measures of Oral and Written Language

Effective Assessment Practices in Educational Contexts

Effective Assessment Practices in Educational Contexts

Presentation Transcript

Chapter 1

CHAPTER 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

CHAPTER 1 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1.

Chapter 1 - 1

Chapter 1 1