Bayesian Networks: Why Smart Data Is Better Than Big Data Seminar

Bayesian Networks why smart data is better than big data Bayesian Seminar 16 October 2015 Norman Fenton Queen Mary University of London and Agena Ltd

Outline From Bayes to Bayesian networks Why pure machine learning is insufficient Applications Way forward

From Bayes to Bayesian networks

We have a hypothesis H We get some evidence E E(Positive Test?) H (Person hasdisease?) Introducing Bayes 1 in a 1000 100% accurate for those with disease; 95% accurate for those without What is the probability a person has the disease if they test positive?

Imagine 1,000 people

One has the disease

But about 5% of the remaining 999 peoplewithout the disease test positive. That is about 50 people

So about 1 out of 50 who testpositive actually have the disease That’s about 2% That’s very different fromthe 95% assumed by most medics

A more realistic scenario This is a Bayesian network Cause 1 Cause 2 Disease Z Disease Y Disease X Symptom 1 Test A Symptom 2 Test B The necessary Bayesian propagation calculations quickly become extremely complex

The usual big mistake Combined Hypothesis Combined Evidence/data

The Barry George case

The Barry George case Evidence George fired gun George fired gun

Late 1980s breakthrough Pearl Lauritzen and Spiegelhalter

A Classic BN

Marginals

Dyspnoea observed

Also non-smoker

Positive x-ray

..but recent visit to Asia

How to develop complex models Can we really LEARN this kind of model from data?

How to develop complex models Definitional idiom Cause consequence idiom Induction idiom Measurement idiom Idioms

How to develop complex models Bayesian net objects

How to develop complex models Ranked nodes

Static discretisation: marginals

Dynamic discretisation: marginals

Static discretisation with observations

Dynamic discretisation with observations

Why pure machine learning is insufficient

A typical data-driven study

BN Model learnt purely from data Age Brain scanresult Injurytype Outcome Delay in arrival Arterialpressure Pupildilation

Regression model learnt purely from data Brain scanresult Injurytype Arterialpressure Delay in arrival Pupildilation Age Outcome

Expert causal BN with hidden explanatory and intervention variables Arterialpressure Injurytype Brain scanresult Delay in arrival Pupildilation Seriousnessof injury Age Ability torecover Treatment Outcome

Danger of pure data driven decision making: Example of a Bank database on loans

Other examples Massive databases cannot learn even tiny models The massive shadow cast by Simpson’s paradox See:www.probabilityandlawblogspot.co.uk

applications

Legal arguments and forensics

Football prediction overview

Parameter learning from past data

Game specific information

Taking account of fatigue

Incorporating recent match data

Final prediction

Final prediction www.pi-football.com Constantinou, A., N. E. Fenton and M. Neil (2013): "Profiting from an Inefficient Association Football Gambling Market: Prediction, Risk and Uncertainty Using Bayesian Networks". Knowledge-Based Systems. Vol 50, 60-86

Trauma Care Case Study • QM RIM Group • The Royal London Hospital • US Army Institute of Surgical Research

Improving on MESS Score method

Life Saving: Prediction of Physiological Disorders

Limb Saving: Prediction of Limb Viability

www.traumamodels.com

Bayesian Networks: Why Smart Data Is Better Than Big Data Seminar