"Let's leave that up to the computer" Chris Snijders

"Let's leave that up to the computer"Chris Snijders Prof. dr. C.C.P. Snijders Dept of Technology Management Technische Universiteit Eindhoven Postbus 513 5600 MB Eindhoven c.c.p.snijders@tue.nl

Overview • Case: Cook county hospital • The science behind it • The computer as a decision maker

Case: Cook county hospital • Emergency Department • 250.000 patients per year • many persons without insurance • not enough rooms, overworked staff • 1996: Brendan Reilly director • (see Gladwell, 2005)

Problem 1: acute chest pain Diagnose through: blood pressure, stethoscope: fluid in the lungs, how long have you been experiencing pain, how does it feel precisely, where does it hurt, does it always hurt or only when you exercise, have you had heart problems before, how about your cholesterol, do you have diabetes, let's look at your ECG, are there any heart problems in the family, do you use drugs, how old are you, are you in shape, do you smoke, do you drink, check appearance: stressed, overweight, .... High risk : 8 Medium risk : 12 Go home 30 p/day

Reilly finds Goldman: obv 10,000 cases Only 4 things matter ECG Blood pressure Fluid in your lungs "unstable angina"

Great! So let's do that! Or not... Implementation: … physicians protest … A test: 20 cases were given to several physicians Hardly any agreement between physicians!

Reilly tests Goldman’s idea physician Goldman’s scheme vs 82% 95%

A literature check … Clinical versus statistical prediction For instance (zie Grove et al., 2000) • Survival probabilities in medical procedures • Probability of recidivism • Probability of success of starting firms • Choice of job candidates • Diagnosing schizofrenia • Predicting school success • …

The results … Over 160 studies When given the same info, the number of cases in which the expert wins = ??

… and that is understandable Our memory fools us (Wagenaar) “Dealing with probabilities / Base rate neglect”(Bar-Hillel) We emphasize the improbable’ (Stickler) Confirmation bias (Edwards, Wason) Mental sets (Redelmayer, Tversky) Hindsight bias (Fischhoff) Cognitive dissonance (Festinger)

And there are more of these "Mental Floating Frankfurters"

Restriction 2: Memory “Where were you, when …” Shuttle Columbia Crew Lost Feb. 1, 2003

Restriction 3: the “availability heuristic” What is more likely, a plain crash or a car crash?

Restriction 4: dealing with probabilities Suppose: a manager has a good intuition in business: • when a problem will arise: he gets a gut-feeling that something is wrong with probability 90% • when no problem will arise: he gets a gut-feeling that something is wrong with probability 10% On average, there is a problem in 5% of the transactions. The manager starts a transaction, and he gets a gut-feeling that something might be wrong. What is the probability that something is indeed wrong?

Restriction 4: dealing with probabilities A murder has been committed. The only evidence available is DNA, found at the murder scene. DNA-research shows a match with your DNA. The probability that two persons are diagnosed as having the same DNA is about 1 in 100.000. How likely is it that you are the murderer?

Restriction 5: overconfidence Trivial Pursuit: estimate how many questions you will know Estimates are generally too high ... and this gets worse with expertise!

Restriction 6:Finding non-existent patterns

Restriction 7: the noble art of finding a broken leg

Decision making = Store, retrieve, combine

Our own experiments:purchasing managers vs

A database of purchasing transactions • Collected at Utrecht and Eindhoven University • Since 1995 • >4000 purchasing transactions • >1500 firms • 300 characteristics per transaction

The role of the purchasing manager Purchasing transaction is to be started Purchasing transaction completed BTW we chose economics, because it is relatively rare in this kind of research

Main idea • Purchasing professionals should be able to judge, for a given purchasing transaction, how likely it is that problems will arise. • The test: we present purchasing professionals with a set of purchasing transactions. The professionals then judge how likely it is that problems occur. • But … these transactions were sampled from our database of purchasing transactions. For all these transactions, we know which and how many problems occurred. • We can therefore compare the predictions of the professionals with what actually happened.

Lay Computer Student Manager grade 6.2 Our test grade 6.2 Case from database grade 6.9 grade 5.4

Model-based prediction • based on cross-validated “Multivariate Adaptive Regression Splines” (of order 1)

A set of experiments • Original test (91 participants) • I&L test 1. A call in the Dutch “Tijdschrift voor inkoop en logistiek” (72 participants; webversion) • I&L test 2. After publication of the results in I&L, we left the website version open (72 additional participants) • MC. MasterClass in Strategic Purchasing and Supply Management, Corsendonk, Belgium (13 participants) • Rep03. Replication of original experiment, including small extension (118 participants) • PC03. 249 PanelClix participants (opt-in panel in NL) • IK05. 148 participants (48 professionals) as part of a class assignment in Information Sciences

Extensions (1) • “This is not what I do in my regular job” Ask before during and after test whether professionals feel they are doing a good job. • “I would have done better if I would have had more information” Use either 7 or 14 pieces of information. • “I usually perform better under pressure” Do it again - now there is pressure. Do it again - with a time constraint.

Extensions (2) • Accountability Have purchasing managers: - explain to the interviewer why they decide as they do - decide in groups of two • How about differences between persons? - younger professionals perform a bit better - those who “decide by instinct” perform a bit worse

The risk of experience trust score experience experience … trust increases with experience, … but the test score does not ...

Seasoned vs young pro Experienced • Relatively crude • Associating and fast • Comparison with experienced ‘prototypical cases’ • Story telling But: • Overgeneralization • May seem efficient, but is not effective Young pro • More ‘fine-tuning’ • Deliberate and slow • Less experienced, and thereby less bothered by ‘prototypical cases’ • More abstract • Less mistakes • Decisions may seem less efficient, but are relatively effective "Recognition Primed Decision making" (Klein)

Improving experts • Expert system 1 Averaging per case across professionals. For each case, use the average score of the experts • Expert system 2 Averaging within professionals, across cases: for each expert, calculate the implicit model they are using. Use that model to predict again (based on volume, length of relation, ability to judge price/quality) Increase from 0.19  0.23, still < 0.32 No improvement.

Improving experts (2) • Averaging professionals with the computer model (50% pro, 50% model). • Combining expert system 2 with the computer model (50% expert system 2, 50% model). Increase from 0.19  0.27, still < 0.32. Increase from 0.19  0.36 > 0.32 (p=0.024, 1-sided)

Kinds of e-purchasing • Decrease tedious paperwork • Viewing your partners’ supplies • Online market places [auctioning] • Structuring decisions … but there is an opportunity for … • ... letting the computer decide! [real business intelligence]

For which kinds of decisions? Purchasing to be done Purchasing done 1) Quantitative 2 ) Have data, or willingness to collect 3) Returning decisions From those, choose the ones that are most important to you. How do I choose my purchasing team (how many persons, which levels, etc)? How much time do I invest into writing a detailed list of demands? How shall I choose my supplier? What kind of contract shall I use? (tailor-made, standard, ...) How much time and effort am I going to put into this transaction to make sure that it runs smooth? Which other ways of trying to make sure that this transaction runs smooth shall I use? ….

Beware So purchasing managers / experts are no longer necessary? e-decisions Other decisions Other tasks

Experts decide in different ways • are more selective in their search for info • store information faster • use less information, and often this information is combined in a non-linear way • have a more active pattern of contingent search: they consider subsets of variables, different subsets in each case • compare given information with the information in their knowledge base … • … but choices are often based on over-generalization of specific cases • use more “broken-leg cues”

… but not better “process performance paradox”: in a large number of tasks, the experts decide in different ways, but not better than those with a minimal amount of training. (cf. Camerer and Johnson, 1991)

Do not take this to extremes...

Models in, intuition out? No. RPD model works well, for specific tasks and you need to properly train your intuition • deliberate practice • compile extensive experience bank • obtain accurate and timely feedback • enrich experiences by reviewing prior ones

Gigerenzer: Fast and frugal • Humans use “fast and frugal” strategies (Simon: bounded rationality, satisficing) or “Probabilistic Mental Models” by employing simple decision-strategies individuals with limited knowledge are able to arrive at equally, or even more, correct predictions than individuals who have extensive knowledge • Gigerenzer’s experiment Which city has the most inhabitants? Arnhem Eindhoven •  Try to see how such fast-and-frugal strategies can have a role in the experiments

Example: #inhabitants Predict which of 2 US cities has the most homeless people, based on • vacancy rate • temperature • unemployment rate • poverty rate • public housing [all binary variables, cut at the median]

Strategies [when training set <> test set] • Minimalist - • Take-the-best 63 • Dawes’ Rule 61 • Multiple regression 60 • Bayesian network 65

Fast-and-frugal design:left - same - right Additional factor: time pressure / 48 Purchasing pros, 50 students, 50 super-laypeople

Results (1) All cases Only extremes

Results (2)

Results (3) (only using the larger differences) Perfect 100 Model 60 Take-the-best 56 Linear regression 43 Unit regression (std) 43 Random 33

Some conclusions • In our managerial context, model-based decision making clearly outperforms professional decision making • Good model scores can be achieved by fast-and-frugal strategies... • ... but the humans in our experiment clearly did not use this kind of logic

Implications • one might be able to teach people to improve their judgment(s) • in actual managerial practice, there might often be no need to develop fancy models to get a close to optimal improvement • decision support by creating expert systems that mimic experts makes no sense in this context • To get this implemented, you need two things: a Goldman and a Reilley

"Let's leave that up to the computer" Chris Snijders