320 likes | 342 Views
Predictive Tests. Overview. Introduction Some theoretical issues The failings of human intuitions in prediction Issues in formal prediction Inference from class membership: The individual versus group problem (and its only solution) Some predictive tests
E N D
Overview • Introduction • Some theoretical issues • The failings of human intuitions in prediction • Issues in formal prediction • Inference from class membership: The individual versus group problem (and its only solution) • Some predictive tests • Prediction in science and psychometrics
Predictive Tests • Many tests are used to make predictions, of levels of achievement or success, or of likelihood of recidivism, or diagnostic category • Two kinds of predictions: • Categorical: Predict which category this subject will fall into (diagnosis, occupation) • Numerical: predict the value of a relevant numerical value (GPA, economic return to company)
The failings of human intuition • We have already seen many ways in which humans succumb to errors in numerical reasoning • Kahneman & Tversky: Asked subjects about areas of graduate specialization: base rate estimation, estimates (from a description) of similarity to other students in each field, and predictive estimate (also from a description)
Results • Results: • Similarity and prediction correlate at 0.97 • Similarity and base rates correlate at -0.65 • What does this result remind you of? • What do these subjects need to be taught?
6 Errors discussed by Kahneman & Tversky • Representativeness error: Assumes predictions are not different from assessments of similarity • Insufficient regression error: People fail to take into account that when predictive validity is less than perfect, correlations between predictors and performance should be < 1 • Central tendency error: Subjects making judgments tend to avoid extremes, and compress their judgments into a smaller range than the phenomenon being judged
6 Errors discussed by Kahneman & Tversky • Discounting of prior probabilities: Human predictors will throw out base rate info for almost any reason • Overweighting of coherence: There is greater confidence in predictions based on consistent input than inconsistent input with the same average (i.e. two B's is better than a B & C for predicting a B average) • Overweighting of extremes: Confidence in judgment is over-weighted at extremes, especially positive extremes (= j-shaped confidence function)
What do we need to make good predictions? • We need three pieces of information: • 1.) Base rates • 2.) Relevant predictors in the individual case • 3.) Bounds on accuracy (cutting scores) • Kahneman & Tversky's experimental evidence (previous slides) show that subjects usually fail to weight any of these three properly
Review: Measuring validation error • Coefficient of alienation (or coefficient of non-determination) = k = (1 - r2), where r is correlation of test score with some predicted performance • k = the proportion of the error inherent in guessing that your estimate has (percent of variance not accounted for) • If k = 1.0, you have 100% of the error you’d have had if you just guessed (since this means your r was 0) • If k = 0, you have achieved perfection = your r was 1, and there was no error at all* • If k = 0.6, you have 60% of the error you’d have had if you guessed * N.B. This never happens.
Why should we care? • We care because r/k are useful in interpreting accuracy of an individual’s scores • r = 0.6 (good), k = 0.64 (not good) • r = 0.7 (great), k = 0.51 (not so great) • r = 0.9 (fantastic!), k = 0.19 (so so) • Since even high values of r give us fairly large proportion of variance, the prediction of any individual’s criterion score is always accompanied by a wide margin of error • Recall; Smr = S * (1 - r)0.5 --> Individual error margins are a function of how good our correlation is • The moral: Predicting individual performance is really hard to do!
What can we infer from class membership? • Some commentators have suggested that inference from class membership is inherently fallacious • i.e. 25% of first-degree relatives of those diagnosed with malignant melanoma (skin cancer) will also develop melanoma • I am a first-degree relative of two persons diagnosed with melanoma, so I take my odds of developing the disease to be >= 25% • Critics of the inference say: No, it is either 0% (I don't develop the disease) or 100% (I do): i.e. group probabilities don't apply to individuals
Do group probabilities apply to individuals? • Meehl's response: "If nothing is rationally inferable from membership in a class, no empirical prediction is ever possible" • The argument is a re-statement of the necessity of inference: even in the case of predicting individual behavior from that individual's data, we need to consider the pattern over past data • Moreover, claim of 'certainty' is philosophical, not real: in the absence of knowing which group you are in, there is only probability, not knowledge
"One incident that occurred while [future Nobel Laureate Kenneth Arrow] was forecasting the weather illustrates both uncertainty and the human unwillingness to accept it. Some officers had been assigned the task of forecasting the weather a month ahead, but Arrow and his statisticians found that their long-range forecasts were no better than numbers pulled out of a hat. The forecasters agreed and asked their superiors to be relieved of this duty. The reply was: 'The Commanding General is well aware that the forecasts are no good. However, he needs them for planning purposes'." Peter Bernstein Against The Gods- The Remarkable Story of Risk
Some Predictive Tests: Standardized admission tests • Thanks to Lily Tsui for these GRE slides • Scholastic Aptitude Tests (SAT, GREs) are highly reliable tests developed to painstaking psychometric standards • The general GRE has four sections: verbal (including reading comprehension), quantitative (including chart comprehension), analytical, and a random test section • The subject test has 215 multiple choice questions • On psychology: 40% experimental/natural science; 43% social science; 17% general • The test is timed and corrected for guessing
Sample Verbal Questions • Analogies: ETERNAL : END a. precursory : beginning b. grammatical : sentence c. implausible : credibility d. invaluable : worth e. frenetic : movement
Sample Verbal Questions • Sentence Completions Museums, which house many paintings and sculptures, are good places for students of _____. a. art b. science c. religion d. dichotomy e. democracy
Sample Verbal Questions • Antonyms MALADROIT a. ill-willed b. dexterous c. cowardly d. enduring e. sluggish
Sample Quantitative Questions • Quantitative Comparison Column A: y-6 Column B: -3 If y > 2: a. the quantity in column A is always greater b. the quantity in column B is always greater c. the quantities are always equal d. It cannot be determined from the information given
Sample Quantitative Questions • Problem Solving The sum of x distinct integers greater than zero is less than 75. What is the greatest possible value of x ? a. 8 b. 9 c. 10 d. 11 e. 12
Sample Analytical Questions A pastry shop will feature 5 desserts-- V,W,X,Y & Z-- to be served Monday thru Friday, one dessert a day, that conforms to the following restrictions: Y must be served before V. X and Y must be served on consecutive days. Z may not be the second dessert to be served.
Reliability • Within-test reliability is 0.9 • Test re-test reliability is not so good: Repeat test takers for both tests show an average score gain of 20-30 points • This may move a student by a large amount: more than 10 percentiles
Predictive Validity • In one meta-analysis by Sternberg and Williams, they point out that empirical validities of the GRE vary somewhat by field • GRE: correlations between various combinations of GRE scores and grad school performance are only between 0.25 and 0.35, and only marginally better (0.4) if you include undergraduate grades
Construct Validity • Is the GRE getting at anything related to graduate school? • What about motivation, creativity, devotion, conscientiousness, and other aspects that make a successful graduate student? • Some complaints: • Graduate assignments require that students develop research skills, but GRE does not test this • GRE is timed but real life is rarely timed • GRE is individualised but real work usually involves collaboration
Why is the GRE so popular? • Because is in the public eye • Since average scores for admissions on tests such as the GRE are published, there is pressure on schools to keep the average scores of the students that they accept high so that they can remain “competitive” with other institutions in the public eye • One strength of the GR that they have specific regression equation by college: i.e. they can predict future performance at a particular college independently • Because there is relatively little variation in their reference letters and undergraduate GPA --> GRE scores are one main sources of the variation that is needed to rank applicants
Some Predictive Tests: The SAT • SAT: r = 0.4 with university GPA • By comparison, high school grade r = 0.48 • Together, r = 0.55
Can you beat the standards? • Notwithstanding the huge industry waiting to take money from anxious high school students, studying for the SAT doesn't help much • SAT coaching increases scores by about 15 points, which is 0.15 SDs • Repeat testing increases it a little less, about 12 points or 0.12 SDs • How much should we pay for 0.1 SDs?
Some Predictive Tests: Professional tests • Professional school tests (MCAT, LSAT) • MCAT: r = low .80s • LSAT: r > 0.9 • There is relatively little evidence of validity • They predict performance about as well as undergraduate GPA alone: r = 0.25 - 0.3
Some Predictive Tests: The Strong Interest Inventory • The Strong (1927) Interest Inventory (Strong-Campbell, 1981): widely used test of interests as predictors of professional aptitude • Empirically constructed with concurrent validity, comparing each vocational group to the overall average • Has 325 items, 162 scales covering 85 occupations • Reliability is high • 0.9+test/retest over weeks; 0.6-0.7 over years unless they were old (= 25+ years) at first test, then 0.8+ even after 20 years • Does not predict success or satisfaction in a profession • Does predict likelihood of entering and remaining in a profession: chances of 50% that a person will end up in a profession most strongly predicted (A score), and only 12% that he will end in one least predicted (C score)
Prediction in scientific psychology • Prediction & scientific explanation are related • We admire Newton's laws precisely because they are accurate in predicting real phenomena • Many cognitive models in psychology are purely descriptive: they fail to make an effort to predict how a person will perform on unseen stimuli • There are many ways to do so, if you have sufficient variation in predictors: multiple regression, neural networks, 'cheap' methods (i.e. best single predictor)
Some lessons about scientific prediction • Models can 'cheat' by using variance in the input data set that does not transfer to unseen data = you must test your predictions on unseen data (= cross-validation) • Some models that are very good may be very good precisely because they are very good at using this 'within-set' variation • Very simple (3-variable) non-linear models may do as well or better than than much more complex models, especially linear models, and may exclude highly-correlated variables • Different measures of successful prediction may yield quite different results (i.e. test correlation versus 0.5 SD correlation)
Some lessons about scientific prediction • Linear assumptions may be limiting: You may hide variance just by taking on the assumption • More predictive power may sometimes (perhaps often) be obtained by dropping the assumptions of linear relations between predictors and the quality to be predicted