460 likes | 624 Views
Instructional Tools in Educational Measurement and Statistics (ITEMS) for School Personnel:. Development and Evaluation of Three Web-Based Training Modules. Rebecca Zwick U.C. Santa Barbara Measured Progress August, 2007. Overview of Presentation. 1. What was the impetus for the project?
E N D
Instructional Tools in Educational Measurement and Statistics (ITEMS) for School Personnel: Development and Evaluation of Three Web-Based Training Modules Rebecca Zwick U.C. Santa Barbara Measured Progress August, 2007
Overview of Presentation • 1.What was the impetus for the project? • 2. How is the project structured? • 3.What’s in the modules, and how are statistical concepts presented? • 4. How effective are the modules? • 5. What have been the challenges and successes? • 6. Clip from Module 3: “What’s the Difference?”
In today’s NCLB era… • Teachers and administrators are expected to use test results to make decisions about instruction and resource allocation and to explain results to students, parents, the school board, and the press. • Many educators have not received the measurement and statistics training needed to use test scores productively.
Stiggins, Education Week, 2002: • “only a few states explicitly require competence in assessment as a condition for being licensed to teach. No licensing examination now in place … verifies competence in assessment … • almost no states require competence in assessment for licensure as a principal or school administrator at any level.”
Evidence from Preliminary Assessment Literacy Survey (Brown & Daw, 2004) Of 24 UCSB M.Ed./credential students, only: • 10 could choose correct definition of Z-score • 10 could choose definition of measurement error Of 10 experienced teachers/ administrators, only: • 5 could choose the correct combined average when told “20 students averaged 90 on an exam and 30 students averaged 40.” • 1 could choose definition of measurement error
Goal of ITEMS • Create 3 25-minute Web-based modules to increase the “assessment literacy” of K-12 educators by teaching basic concepts in educational measurement and statistics, as applied to test score interpretation. • Assess effectiveness of modules Funded by National Science Foundation 2004-2008
Who works on the project? Staff: • Rebecca Zwick, Project Director • Jeff Sklar (Statistics Dept., Cal Poly,San Luis Obispo), Senior Researcher • Alex Norman (Media Arts & Technology, UCSB), Technical Specialist • Cris Hamilton, Independent animator/ designer • Pamela Yeagley (Education, UCSB), Project Evaluator • Liz Alix (Education, UCSB), Project Administrator
Advisory Committee • Kevin Almeroth, Computer Science UCSB • Beth Chance, Statistics Department, Cal Poly • Willis Copeland, Education, UCSB • Raya Feldman, Statistics, UCSB • Mary Hegarty, Psychology, UCSB • Richard Mayer, Psychology UCSB • Tine Sloan, Acting Director, Teacher Ed, UCSB • 4 administrators & 2 teachers (local districts)
Work cycle:Develop and evaluate 1 module per year: • Fall: Develop module • Winter/spring - Collect data on module effectiveness • Summer - Analyze data; post module on our Website with supplementary materials; distribute CDs/DVDs. • Modules 1 & 2 are posted; Module 3 will be posted soon.
Module Administration and Evaluation • On Website, participants view module & take an assessment literacy quiz tailored to its content. • Participants are randomly assigned to take quiz either before or after viewing module. • Hypothesis: mean score for Module-first (treatment) group will be higher than mean for Quiz-first (control) group. • Participants get $15 Borders (electronic) gift “card” and can print out a personalized completion certificate.
Later phases of data collection: • One-month follow-up: Participants take quiz again to check retention (another Borders card) • Participants respond to Web-based project evaluation survey asking their opinions on the module (no gift card!)
3. What’s in the modules? How are statistical concepts presented?
Module Content • Module 1 (2005): “What’s the Score?” -Test score distributions and their properties, types of test scores, score interpretations • Module 2 (2006): “What Test Scores Do and Don’t Tell Us” -Measurement error and sampling error; imprecision in individual and average test scores • Module 3 (2007): “What’s the Difference?” -Interpretation of test score trends and group differences; data aggregation issues
Modules use cognitive psychology principles to enhance learning • Multimedia: Present concepts using both words and pictures (see Mayer, Multimedia learning, 2001) • Prior knowledge: Use words and pictures that invoke participants’ prior knowledge (Narayanan & Hegarty, 2002); use analogies, metaphors (English, 1997) • Use conversational (informal) style
“Embedded questions” (Modules 2 and 3) • Each module segment includes a question designed to allow participants to check their understanding of the material. • If their answer is incorrect, they’re encouraged to go back and view the segment again. • Found helpful by nearly all participants (Year 3) • Example is in upcoming clip.
Goals for Presentation of Technical Concepts • Clear and accurate, but without formulas or jargon • Based on realistic examples; no abstractions. • Engaging; not just “talking heads” • Decision: Use animated characters
Module 1: How to explain “distribution” of test scores? • Show test papers being tossed into bins, gradually forming a distribution. • Then discuss mean, median, SD, skewness of distribution.
Module 2: How to convey the idea of measurement error? “Multiple Edgars:” • A child takes a test repeatedly . • His brain is magically purged of his memory of the test in between administrations. • For various reasons, he gets different scores each time.
Module 3: How to explain data aggregation complexities and paradoxes? • No abstractions! • Use realistic and specific examples: • Performance for all student groups could increase, but overall school performance decreases (Simpson’s paradox/ amalgamation paradox) …
Module 3: How to explain sampling error (of a change in test score averages)? • Especially complex in the case of NCLB-type testing. • Models based on random sampling are not only hard to explain, but don’t apply! • Solution: Show that the change in test score averages is more “sensitive” to extreme values when N is small.
Later.. • A clip from Module 3 • Module 3 includes upgrades-professional animator, actors, sound studio.
How effective are the modules? Quiz Results Program Evaluation Results Informal Emails
Quiz Results for Module 1 Evaluation (N=113): Average Number of Correct Responses (Out of 20 items)
Quiz Results for Module 2 Evaluation (N= 104): Average Number of Correct Responses (Out of 16 items)
Module 3 quiz results • Major recruitment problems, N= 23 • Module-first and quiz-first groups both scored an average of 10.4 on a 14-item quiz. • Possible reason: Only 4 of 23 were teacher ed students. • Supplementary data analysis may occur - CSU Fresno teacher ed students
One-month follow-up • Quiz results tended to be the same or better at one-month follow-up • However, follow-up samples are small (N= 11, 38, and 10 for the three years) and are not a random subgroup of initial participants
Conclusion on quiz outcomes: • Modules are probably most effective for those who are new to the classroom. • We hope to encourage their use in teacher education programs and in in-service training programs for new teachers.
Formal “independent” program evaluation • Year 1: phone interviews and paper surveys on presentation, content, impact • Years 2 and 3: Web-based surveys • Responses to above were positive, but participation rates were only 10-12%.
Formal program evaluation (continued) • Comments entered in boxes during participation were mixed: • Some negative comments on navigational features (later improved) and on animation • Comments on content and utility were favorable
Sample of Email Comments Received • “Very helpful and right to the point. If I were a building principal or a department chair today all of the staff would go through this until everyone really understood it.” • “I am inclined to recommend [this] as required viewing for all new hires in our K-12 district, and it certainly will be recommended … for inclusion in professional development on assessment literacy.” • “I will be sharing [this] with my Assistant Superintendent with the hope of promoting it as a part of our new teacher induction process.”
The big challenge: publicity and recruitment Despite • Ads in two educational magazines • Personal contacts with school districts • District participation on advisory committee • Contacts with professional organizations • Contacts with California State Dept. of Education and other state organizations • Dean’s letter to 100+ superintendents • Website and blog postings
Successes • Automated system has facilitated administration and evaluation of module; module quality has improved. • Quiz results show Modules 1 and 2 were effective, mainly for teacher education students. • Participant comments indicated that modules were found useful by many.
The future… • “Repackaging project?” • Redo modules with superior production values, as in Module 3: professional animation, professional actors, sound studio • Unify “look and feel” across the modules • Work on mechanisms for disseminating as a package
MORE INFORMATION?? • See http://items.education.ucsb.edu • See Zwick, Sklar, Wakefield, & Folsom, Educational Measurement: Issues and Practice, in press. • Email us at: • rzwick@education.ucsb.edu OR • items@education.ucsb.edu
Disclaimer • Any opinions, findings, and conclusions or recommendations expressed in this material are thoseof the author(s) and do not necessarily reflect the views of the National Science Foundation
Clip from Module 3: “What’s the Difference?” • Topic: How the number of students affects the interpretation of score trends • Context: Press conference • 2 reporters ask questions about a recent test score release. • Superintendent Florence and 2 teachers–Stan, and Norma–respond.