840 likes | 1.1k Views
Evaluation Metrics. February 12, 2010. A break in the usual order of things…. Today’s Probing Question will be discussed later in the class rather than at the beginning Your responses to this (those of you who responded) were the most thoughtful ones I’ve seen all semester
E N D
Evaluation Metrics February 12, 2010
A break in the usual order of things… • Today’s Probing Question will be discussed later in the class rather than at the beginning • Your responses to this (those of you who responded) were the most thoughtful ones I’ve seen all semester • You really engaged with the implications, both at an educational level and a policy level
Today’s Class • Evaluation Metrics • Last Wednesday’s Probing Question • Assignments
Starting from the simplest metric… • Pre-test • Post-test • Of what the student (hopefully) learned during the learning intervention
Post-test • What is “SQUIRREL” in Japanese? • People named Adam not allowed to answer
Is there ever a case where you don’t need to do a pre-test? (or shouldn’t do one?)
Is there ever a case where you don’t need to do a pre-test? (or shouldn’t do one?) • Al Corbett did not use pre-tests for some research on the LISP tutor, he just filtered participants who had ever used LISP or Scheme before, under the logic that LISP was so different from other programming paradigms that there would essentially be no overlap • What do you think?
Is there ever a case where you don’t need to do a pre-test? (or shouldn’t do one?) • A dangerous decision, in my opinion • Singley & Anderson (1989), and many others, find that there can be surprising and unexpected degrees of transfer
How can you mess up your tests? • I’m not asking about ways to do a better test • E.g. Bransford & Schwartz would say PFL is better than a standard pre-test of knowledge • But things you could do that will result in useless data
How can you mess up your tests? • Multiple choice with terrible alternatives • What is the capital of Tajikstan? • Raise your hand if you know the answer
How can you mess up your tests? • Multiple choice with terrible alternatives • What is the capital of Tajikstan? • Boston • Worcester • Tokyo • Dushanbe
How can you mess up your tests? • Using the same items for both pre-test and post-test for any given student “Gee, this looks familiar…”
How can you mess up your tests? • Using pre-tests and post-tests of different difficulty • Pre-test: What is the capital of Tajikstan? • Post-test: What is the capital of Japan? • Look how great my geography tutor is!
How can you mess up your tests? • Using pre-tests and post-tests of different difficulty • (Even worse if you put the easy items on the pre-test and the hard items on the post-test!) • The most common approach is to counter-balance the tests • Half of students: Pre-test Form A, Post-test Form B • Half of students: Pre-test Form B, Post-test Form A
How can you mess up your tests? • Letting students “help” each other during the tests • Raise your hand if you’ve ever seen this
How can you mess up your tests? • Letting the teacher give a student the answer during the post-test • Raise your hand if you’ve ever seen this
How can you mess up your tests? • Not communicating that an online test is not a tutor • “Hey, how come this tutor doesn’t have any feedback?”
Pre-Post Comparison (4 ways) • t-test on Post-test - Pre-test for each group • Advantages? Disadvantages?
Pre-Post Comparison (4 ways) • t-test on Post-test – Pre-test for each group • Advantages? Disadvantages? • Vulnerable to ceiling effects 100% Test Score 0% Pre Post
Pre-Post Comparison (4 ways) • t-test on (Post-test – Pre-test)/(1-Pre-test) for each group • Advantages? Disadvantages?
Pre-Post Comparison (4 ways) • t-test on (Post-test – Pre-test)/(1-Pre-test) for each group • Accounts for high performers… • But has weird effects if anyone does worse on post-test than pre-test • Pre = 20%, Post = 10%, Res = -50% • Pre = 100%, Post = 90%, Res = -∞%
Pre-Post Comparison (4 ways) • Regression set up as Post-test = a0 Pre-test + a1 Condition + a2 • allows you to find mean difference in conditions while controlling for each student’s pre-test score • Advantages? Disadvantages?
Pre-Post Comparison (4 ways) • Regression set up as Post-test = a0 Pre-test + a1 Condition + a2 • allows you to find mean difference in conditions while controlling for each student’s pre-test score • You need to check that condition differences are not actually pre-test differences between conditions using Pre-test = a0 Condition + a1
Pre-Post Comparison (4 ways) • Effect Size: (Mean Gain in Experimental – Mean Gain in Control)/ St Dev in Control • Advantages? Disadvantages?
Pre-Post Comparison (4 ways) • Effect Size: (Mean Gain in Experimental – Mean Gain in Control)/ St Dev in Control • How big is the difference between groups?(not just how likely is it, if chance was all there was)
(Some Types of)Contents of Tests • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving
Types I believe you already know • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving
Complete Problem-Solving Draw a scatterplot of this fake data
Decomposed Problem-Solving What variables would you use to draw a scatterplot of this data?
Have them turn in their answer • (Or go to the next webpage)
Decomposed Problem-Solving What is a good scale for Population?
Have them turn in their answer • (Or go to the next webpage)
Decomposed Problem-Solving What is a good upper and lower bound for Population?
Have them turn in their answer • (Or go to the next webpage)
Decomposed Problem-Solving Label the axes with values(Have Population go from 0 to 700 with scale of 50, and Number of Restaurants go from 0 to 80 with scale of 10) Number ofRestaurants Population
Advantages/Disadvantages? • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving
Advantages/Disadvantages? • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving
Advantages/Disadvantages? • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving
Advantages/Disadvantages? • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving
Advantages/Disadvantages? • Multiple-choice • Fill-in-the-blank • Essay • Complete Problem-solving • Decomposed Problem-solving
“Contingent Correctness” Grading • Some researchers try to deal with the issue of partial correctness in complete problem-solving by grading contingent correctness • i.e. If step A is wrong, but step B is correct based on step A, count step B as correct • E.g. if the student used the wrong variable, but plotted the points correctly, the point plotting is contingently correct • Time-consuming and tricky to do
Learning Efficiency • Perhaps two conditions have equal learning, but one condition takes significantly more time than another condition • Advantages? Disadvantages?