580 likes | 620 Views
Bayesian Networks why smart data is better than big data. Bayesian Seminar 16 October 2015 Norman Fenton Queen Mary University of London and Agena Ltd. Outline. From Bayes to Bayesian networks Why pure machine learning is insufficient Applications Way forward.
E N D
Bayesian Networks why smart data is better than big data Bayesian Seminar 16 October 2015 Norman Fenton Queen Mary University of London and Agena Ltd
Outline From Bayes to Bayesian networks Why pure machine learning is insufficient Applications Way forward
We have a hypothesis H We get some evidence E E(Positive Test?) H (Person hasdisease?) Introducing Bayes 1 in a 1000 100% accurate for those with disease; 95% accurate for those without What is the probability a person has the disease if they test positive?
P(E|H)*P(H) P(E) P(E|H)*P(H) P(H|E) = = P(E|H)*P(H) + P(E|not H)*P(not H) 1*0.001 0.001 2% = P(H|E) 0.0196 = 1*0.001 + 0.05*0.999 0.5005 Bayes Theorem We have a prior P(H) = 0.001 Waste of time showing this to most people!!! We know the (likelihood) values for P(E|H) But we want the posterior P(H|E) =
Imagine 1,000 people
One has the disease
But about 5% of the remaining 999 peoplewithout the disease test positive. That is about 50 people
So about 1 out of 50 who testpositive actually have the disease That’s about 2% That’s very different fromthe 95% assumed by most medics
A more realistic scenario This is a Bayesian network Cause 1 Cause 2 Disease Z Disease Y Disease X Symptom 1 Test A Symptom 2 Test B The necessary Bayesian propagation calculations quickly become extremely complex
The usual big mistake Combined Hypothesis Combined Evidence/data
The Barry George case Evidence George fired gun George fired gun
Late 1980s breakthrough Pearl Lauritzen and Spiegelhalter
How to develop complex models Can we really LEARN this kind of model from data?
How to develop complex models Definitional idiom Cause consequence idiom Induction idiom Measurement idiom Idioms
How to develop complex models Bayesian net objects
How to develop complex models Ranked nodes
BN Model learnt purely from data Age Brain scanresult Injurytype Outcome Delay in arrival Arterialpressure Pupildilation
Regression model learnt purely from data Brain scanresult Injurytype Arterialpressure Delay in arrival Pupildilation Age Outcome
Expert causal BN with hidden explanatory and intervention variables Arterialpressure Injurytype Brain scanresult Delay in arrival Pupildilation Seriousnessof injury Age Ability torecover Treatment Outcome
Danger of pure data driven decision making: Example of a Bank database on loans
Other examples Massive databases cannot learn even tiny models The massive shadow cast by Simpson’s paradox See:www.probabilityandlawblogspot.co.uk
Final prediction www.pi-football.com Constantinou, A., N. E. Fenton and M. Neil (2013): "Profiting from an Inefficient Association Football Gambling Market: Prediction, Risk and Uncertainty Using Bayesian Networks". Knowledge-Based Systems. Vol 50, 60-86
Trauma Care Case Study • QM RIM Group • The Royal London Hospital • US Army Institute of Surgical Research