1 / 31

Measuring Forecaster Performance

Measuring Forecaster Performance. Lt Col James E. Kajdasz, Ph.D., USAF. Scholarship of Intelligence Analysis.

halden
Download Presentation

Measuring Forecaster Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring Forecaster Performance • Lt Col James E. Kajdasz, Ph.D., USAF

  2. Scholarship of Intelligence Analysis • “A comprehensive review of the literature indicates that while much has been written, largely there has not been a progression of thinking relative to the core aspect and competencies of doing intelligence analysis.” (Mangio & Wilkinson, 2008) • “Do [they] teach structured methods because they are the best way to do analysis, or do they teach structured methods because that’s what they can teach?” (Marrin, 2009)

  3. Grade forecasters on % correct? judgments • We could grade forecaster accuracy similar to a T/F test. (yes/no answers) • Will Qadhafi still be in Libya at this time next year? No • Will the government of Yemen fall in the next year? No • Will I still be driving my 2001 Corolla in the year 2020? Yes • Wait until outcomes occur/don’t occur, and calculate percent of correct forecasts. • Compare Forecaster A to Forecaster B by seeing who has the higher % correct.

  4. What about probabilistic judgments? • When there is a high level of uncertainty, laypeople and even experts often qualify judgments. • Will Qadhafi still be in Libya at this time next year? No (70% confidence) • Will the government of Yemen fall in the next year? No (60% confidence) • Will I still be driving my 2001 Corolla in the year 2020? Yes (95% confidence)

  5. What about probabilistic judgments? __ __ __ __ __ __ __ __ __ __ __ 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 Highly likely Highly unlikely Certainty Somewhat unlikely Somewhat likely As likely as other two possibilities combined Impossible Tetlock, 2005

  6. Let’s Compare analysts… • So which analyst performed best? • It’s hard to say… We need a summary statistic to summarize total performance.

  7. Mean Probability Score • Probability Score or Brier Score • Estimate: • Probability provided by forecaster • .00 – 1.00 • Outcome: • 0 (if event did not occur) • 1 (if event did occur)

  8. Mean Probability Score • Probability Score or Brier Score • Forecaster says 70% probability X will occur. • X occurs.

  9. Mean Probability Score • Mean Probability Score or Mean Brier Score

  10. Let’s Compare analysts…

  11. Components of Total Forecaster Error • Several things contribute to overall error, not all of which can be controlled by the forecaster. Discrimination Errors Calibration Errors Variance of the Outcome

  12. Decomposing Mean Probability Score Slope Bias Scatter Var(d)

  13. Decomposing PS: Bias Arkes, Dawson, Speroff & et.al. (1995) Where: = Mean estimate = Mean outcome Estimated Probability of Survival (f) Outcome Index (d)

  14. Decomposing PS: Slope Arkes, Dawson, Speroff & et.al. (1995) Where: = Mean estimate when outcome was 1 = Mean estimate when outcome was 0 Estimated Probability of Survival (f) Outcome Index (d)

  15. Decomposing PS: Scatter Arkes, Dawson, Speroff & et.al. (1995) Where: = Variance when outcome was 1 = Variance when outcome was 0 Estimated Probability of Survival (f) Outcome Index (d)

  16. Patients Doctors Title PS=.18 Bias=-0.11 Slope=.26 Scat.=.05 PS=.23 Bias=0.13 Slope=.13 Scat.=.05 • Body Estimated Probability of Survival (f) Outcome Index (d) Outcome Index (d) Arkes, Dawson, Speroff & et.al. (1995)

  17. Prediction Markets

  18. A-priori Hypotheses: • H1: Discrimination will improve as the event nears • Slope measure will increase over time. • H2: Scatter will decrease as the event nears • Scatter measure will get smaller over time. • H3: Analysts will be biased toward predicting the status quo • Bias measure will be negative

  19. T-70 Days

  20. T-60 Days

  21. T-50 Days

  22. T-40 Days

  23. T-30 Days

  24. T-20 Days

  25. T-10 Days

  26. PS is a measure of overall error • low PS is better • Graph suggests curvilinear relationship with time Total Error over Time

  27. Components of Error • PS composed of Bias, Slope, Scatter, and Variance of the outcome • Graph suggests decrease in error is primarily due to improvement in slope • Slope is a measure of discrimination • High slope is better

  28. Modeling Slope Over Time • The observed slope was modeled. • Curvilinear relationship modeled with Days and Days2 • AdjR2 = .834, p=.01 • H1 supported. Discrimination improves as date approaches. .6 .4 Slope .2 .0 -.2

  29. Scatter Over Time • Scatter is a measure of ‘spread’ of probability estimates. • Slight linear trend not significant. • H2 not supported.

  30. Bias Over Time • Questions recoded such that probability ‘0’ represented a continuation of status-quo, and probability ‘1’ represents a change in status-quo • Analysts were biased toward predicting a change in the status-quo • Indicated by positive bias numbers • t(6)=4.73, p < .01 • H3 not supported. • BUT significant results in the direction opposite that hypothesized. • Linear trend over time not statistically significant.

  31. Lt Col James E. Kajdasz, Ph.D., USAF James.kajdasz@dia.mil The views expressed in this presentation are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

More Related