370 likes | 524 Views
Early Childhood Assessment and Accountability. Presented by Anita Skop, Bill Stroud and Deena Abu-Lughod SAF Conference, July 1 2008. Presentation. Rationale and purpose Big Issues, Experiences, Challenges and Benefits Alignment: ECLAS 2 and the ELA
E N D
Early Childhood Assessment and Accountability Presented by Anita Skop, Bill Stroud and Deena Abu-Lughod SAF Conference, July 1 2008
Presentation • Rationale and purpose • Big Issues, Experiences, Challenges and Benefits • Alignment: ECLAS 2 and the ELA • Statistical Exploration: Examples from ECLAS 2, DIBELS and Running Records
Rationale and Purpose • SAF advisors for development of early childhood progress report • Consider methods that could measure value added while controlling for known demographic trends • Evaluate students or programs? • Ensure that EC schools get meaningful feedback, including backmapped data • Familiarize colleagues with several EC assessments to enable them to advise IQTs on their utility
The Big Questions • What outcomes are we trying to achieve with an Early Childhood accountability system? • What do we want to develop in children as a result of their experiences in school?
EC Program Accountability in Other States: • New Jersey - Early Childhood Environmental Rating Scale. Random samples from Abbott districts to assess instructional practices in language and math in order to plan professional development and other supports for learning. • Texas - Texas Primary Reading Inventory and a social skills test to rate program quality. • Virginia - Pre-school rating system based on level of teacher training, class size, and expert observation (CLASS: Classroom Assessment Scoring System).
Recommendations(from the National Early Childhood Accountability Task Force) • Create a unified system that connects standards, assessments, data, and professional development for teachers. • Align comprehensive standards, curriculum, and assessments as a continuum pre-K - 3. • Assure that all child assessments and program evaluations are valid, reliable, and well-suited for their intended purpose. • ELL students should be evaluated in both their primary language and their language of instruction. • Adaptations in assessment tools and procedures should be made to allow children with disabilities to participate in the same assessments as their peers to allow a valid assessment of their knowledge and abilities.
Concerns Associated with EC Accountability Systems:(NECATF) • Adequacy of tools. Assessments do not cover all domains nor capture normal fluctuations of children’s development. • Data Integrity: Integrity can be challenged when assessments are administered by individual teachers under conditions of high stakes accountability. • Sample size: Small schools may exhibit substantial fluctuations due to changes in the characteristics of a few children. • Investment: More benefits could come from investing to remedy deficiencies in program quality and staff training rather than developing an accountability system at scale. • Potential consequences: Using child assessment data for high-stakes decisions could lead to serious negative consequences for children as curriculum and instruction narrow to focus more heavily on the assessment measures.
Potential Positive Effects of High Quality Accountability System: • Development of aligned student assessments to draw attention to trajectories of children’s progress K - 3. • Development of aligned institutional assessments for environment, opportunities, instruction, and quality of implementation • Development of vertical teams of teachers and administrators from each grade/age level to review data, plan and adjust practices, and support children’s continuous progress. • Development of focused professional development efforts coordinated K - 3. • Development of a stronger sense of shared responsibility for children’s success across the K - 3 continuum.
What do we know? • What are the advantages of early childhood testing? • What are the limitations?
Child Assessment Option #1: Observational Tools • Widely used to generate ratings or estimates of knowledge, skills, or abilities based on performance, behavior or work. • Criterion referenced (compares the student against criteria/standards) • Advantages: cover all domains; multiple opportunities to observe over time and various contexts; teachers use this format already for instructional purposes so results need only be standardized and aggregated; risks of ‘teaching to the test’ are minimized because it does not involve individual questions. • Limitations: assessors must be well trained; possible teacher bias related to culture/language; accuracy of ratings can decline or drift over time; risks of inflating ratings to show rapid progress if results are used to evaluate the program.
Child Assessment Option #2: Direct Assessment Tools • Standardized “Direct” or “On Demand” instruments: use a common set of questions or tasks. • Norm referenced. • An adapted direct approach uses a 2-stage method, adjusting the difficulty depending on children’s responses to an initial set of items. This is quicker, reduces risk of frustration or boredom, and reduces risks of pre-coaching. • Advantages: Lower risk of errors based on assessor’s judgment; common set of questions creates perception that results are objective; scope, depth and costs of training are lower. • Limitations: Assessors must be well trained; children must feel comfortable with the assessor; students must be able to process language well; cultural differences and pedagogical practices may influence how children respond to questions or tasks; do not assess social-emotional goals; reliance on a specific set of questions creates risks that teachers can coach children to inflate outcomes.
NYC has laid important groundwork • NYC has vertically-aligned standards-based early childhood assessments (eg, ECLAS 2) • Inquiry team work provides framework for developing vertical teams to review data, plan and adjust practice • Expertise exists for coordinated professional development • Required EC testing has increased shared sense of responsibility for success across the K-12 continuum
Paper bag over our heads • Early childhood schools have NEVER received the results of backmapping. • “This is like working with a paper bag over our head. We don’t know how our students do after they leave us.” • “DIBELS is too easy. More students were on benchmark than I believe.” (Informal backmapping showed that 90% of her former 2nd grade students scored in Levels 3+4 in 2007.)
Removing the Paper Bag • Given the issues identified earlier, how well do our Early Childhood assessment tools help us predict how the students will perform on the Grade 3 ELA? • What are the implications?
Gr 2 ECLAS 2 Spring 2007 and Grade 3 ELA 2008 (sample school #1; n=65) • Strong correlations between some ECLAS 2 components and Grade 3 ELA Proficiency Rates • Reading comprehension, oral fluency and reading accuracy scores are highly correlated
Grade 2 EPAL and Gr 3 ELA • EPAL has a listening, reading and writing component, each scored in house on a 3 point rubric. • In a sample school, a very weak relationship was found between EPAL and ELA scores. • Exploring the mismatches indicated mis-scoring of student responses and misadministration of running records. • Implication: School has decided to work on grade-level and schoolwide standard setting to ensure that all teachers understand the standards and rubrics in the same way.
DIBELS (Dynamic Indicators of Basic Literacy Skills) SIX COMPONENTS, Available for Pre-K to Grade 3 • ISF: Initial Sounds Fluency -- The student must select a picture of an object whose name begins with a given phoneme. The teacher monitors both fluency and accuracy. (PreK and K) • LNF: Letter Naming Fluency -- The student must identify as many upper-case and lower-case letters as possible within one minute. (K and 1) • PSF: Phoneme Segmentation Fluency -- The teacher says a word aloud, and the student must quickly repeat that word, inserting a clear pause between each phoneme. The student must do this for as many words as possible within one minute. (K and 1) • NWF: Nonsense Word Fluency -- The student must correctly pronounce as many nonsense words as possible in one minute. (K and 1) • ORF: Oral Reading Fluency -- The student must read aloud as much of a passage of text as possible in one minute. After reading aloud, the student must also describe or retell the content of the passage of text. (1, 2, 3) • WUF: Word Use Fluency -- The student is given a word to use in a sentence or to define, and the teacher monitors both the accuracy of the use or definition as well as the number of words the student uses in his or her response. (K, 1, 2, 3)
Sample School 2 • In this sample school, 37% of its 108 3rd graders scored at or above proficiency on the 2008 Grade 3 ELA. • Last year, 30% of those 108 scored at benchmark on the Grade 2 DIBELS. • How well did the DIBELS predict the results?
Crosstab: Grade 3 ELA 2008 by Grade 2 DIBELS EOY 2007 Grade 3 ELA Level DIBELS Instructional Recommendation
Statistically Significant Relationship • The relationship between DIBELS scores and the Gr 3 ELA proficiency rates is statistically significant. • Chi-square between categories is 39.766 (significant at .000 level) • Correlation between actual ORF score and ELA proficiency rate is .578** (significant at .000 level)
But is it meaningful? • 4 students who were “high risk” scored in Level 3. Here’s how their teachers explained these false negatives. • “I taught the students to be careful readers, to self correct and monitor for sense.” • “I am interested in increasing their comprehension, not their fluency. I couldn’t in good conscience give them different directions for the test.” • These false negatives suggest that the results can be muddied when the assessment is misadministered.
What about false positives? • Half of the false positives (low risk on DIBELS but scored as level 2s on ELA) were ELLs. They took the DIBELS as instructed (read as fast as you can; don’t worry about mistakes). Oral reading is not connected to comprehension. • The other two students were described as “very bright” but had motivational issues and were “uninterested” in the ELA. • The high representation of ELLs among the false positives suggests that interpreting results for ELLs requires additional considerations.
DIBELS Gr 3 October Oral Reading Fluency and January ELA Proficiency Rate (sample school 3) Oct 2007 ORF Score Gr 3 ELA Proficiency Rate 2008
DIBELS Gr 3 October Retell Fluency and January ELA Proficiency Rate Oct 2007 RTF Score Gr 3 ELA Proficiency Rate 2008
What about DIBELS in Grade 1? • At a Reading First school, we were able to backmap September 2005 DIBELS scores (beginning 1st grade) for 129 current 3rd graders. • 51% were at benchmark on DIBELS in Grade 1; 50% were proficient on the ELA in Grade 3. • Did their BOY 1st grade DIBELS scores (LNF, NWF, PSF) allow us to predict how well they would perform on the Grade 3 ELA?
Crosstab: Gr 3 ELA by Gr 1 DIBELS (sample school #3, n=66) DIBELS Instructional Recommendation Grade 3 ELA Level
Running Records – Fairly Accurate In a sample of 83 students, where 60 were proficient on the ELA: • All Level 4 students were above the F&P benchmark • 63% of Level 3 students were at or above the F&P benchmark • 5% of Level 2 students were at the F&P benchmark • No Level 1 students approached the benchmark
Reading and Writing Continuums • Reading and writing progress is best monitored through use of continuums, keeping track of date that particular behaviors associated with particular levels is either observed or evidenced in written samples. • Certain behaviors associated with a higher level may be observed before the child actually achieves that level. • These continuums are useful especially for tracking progress in Kindergarten, where teachers often fail to push students who have advanced skills after attending high quality pre-K schools.
Data findings: • The Grade 1 DIBELS components, especially the LNF, is a strong predictor of 3rd grade outcomes. • The reliability of the Grade 2 DIBELS component, the ORF, could not be determined due to misadministration. Word calling ≠ comprehension: fluency in early grades is not necessarily a good predictor. • Well-administered running records are good predictors. Mismatches between running records and ELA scores suggest inconsistent administration of running records (lenient scoring of comprehension questions and oral retells). • Inquiry Team relevance: DIBELS results, running records and ECLAS 2 can all provide very important formative assessment information for guiding instruction and targeting skills, provided they are administered properly. An investment in observing administration will generate benefits to schools. • Dilemma: For accountability purposes, assessments requiring assessor judgment may be best administered by someone other than the teacher, but this can create stress for the child and lead to unreliable administration.
Improving data use • Expand initial exploration: The correlation and regression of citywide DIBELS, ECLAS 2 and other EC assessment data with the Grade 3 ELA results should be conducted to verify whether the patterns explored here are replicated on a broader scale. • Data transparency: Schools should receive information about the relationship between EC assessment results and the ELA to build their confidence in these instruments and become more vigilant about administration. • Use our online tools! Both ECLAS 2 and DIBELS results are available on line through very user-friendly interfaces from WGEN. • Study the outliers: the false negatives will help us generate hypotheses about how to beat the odds; the false positives will teach us to look in more nuanced ways. • Evaluate students and programs: It is important to collect both student assessment data and data on program quality.
Implications • What are the implications of this information for your work with inquiry teams?
Resources • Database of EC Assessments • http://www.sedl.org/reading/rad/database.html • “Taking Stock: Assessing and Improving Early Childhood Learning and Program Quality” - The Report of the National Early Childhood Accountability Task Force