290 likes | 483 Views
Confidence in neural networks: methodological issues arising from a review of safety-related applications. P.J.G. Lisboa p.j.lisboa@livjm.ac.uk. Computing and Mathematical Sciences Liverpool John Moores University. Outline.
E N D
Confidence in neural networks:methodological issues arising from a review of safety-related applications P.J.G. Lisboa p.j.lisboa@livjm.ac.uk Computing and Mathematical SciencesLiverpool John Moores University
Outline • Developments in commercial safety-related systems comprising artificial neural networks • Increasing demand for decision support e.g. healthcare • Where is the evidence of healthcare benefit from ANNs? • Framework for assuring confidence in neural networks • Design assurance • Risk analysis • Evidence of effectiveness • Methodological issues arising from the review
Fire alarm for office blocks • SiemensFP11 • FirePrintTechnology • Very high specificity
Commercial safety-related systems • Automotive industry • Tow-by-wireNACT (http://www.mech.gla.ac.uk/~nact/nact.html) • Fuel injectionFAMIMO (http://iridia.ulb.ac.be/~famimo/) • Electronic ABSH2C (http://www.control.lth.se/H2C/) Lisboa, P.J.G. ‘Industrial use of safety-related artificial neural networks’ HSE CR, 2001 http://www.hse.gov.uk/crr_pdf/crr01327.pdf
Papnet • Cytology screening • FDA approvedfor secondary screening • Proven sensitivity • Specificityleft to user • Cost-effective
Epidemiology of medical error • In the US 44,000 - 98,000 preventable deaths attributed to medical errors (Weingart et al, BMJ 2000) • Exceeds combined toll from • motor crashes • suicides • falls • poisonings • drowning
Epidemiology of medical error Managing error • The just-world hypothesis • Systemic approaches (Reason, BMJ 2000)
Oncology Critical care Cardiology Other Diagnosis and staging Prostatic cancer (2). Cervical cancer. Breast cancer. Acute leukemia. Intracranial haemorrhage in neonates. Acute Myocardial Infarction (2). Appendicitis. Outcome prediction Response to therapyin head & neck cancer. Recurrence of breast cancer in axillary node-negative patients. Length-of-stay in preterm neonates. Tracolimus blood levels. Effect of treatment in schizophrenia and depression. Rib fracture injury. Radiology MRI of osteosarcoma. Perfusion scintigraphy for detection of coronary stenosis. Doppler microembolic signal counts in patients with prosthetic heart valves. Physiological monitoring Fetal surveillance during labour from fetal ECG. Randomised Controlled Trials Lisboa, P.J.G. ‘A review of evidence of health benefit from artificial neural networks in medical intervention’, Neural Networks, 15, 1, 9-37,2002.
Clinical Trials Clinical function Oncology Critical care Cardiology Neurology Other Diagnosis and staging Cervical cancer (3). Pre-cancerous breast. Transient ischaemia. Acute ischaemia. Embolus detection in stroke. Spontaneous EEG. Sleep EEG (2). Quantitative EEG. Ventilation mode recognition. Referral methods for patients with third molars. Bladder outlet obstruction. Tear protein patterns. Haemodialysis. Ovulation time. Pure tone thresholds. Outcome prediction Multiple myeloma. Stone growth after lithotripsy. Radiology Myocardial perfusion images. Detection of stenoses from Doppler u/s waveforms. MRS of epilepsy. PET of 5-HT reuptake sites PET in Alzheimer’s. MRS of muscle. Physiological monitoring EEG in Pediatrics. Single trial PVEP (2). Correlation of EEG and MEG. Lorazepan and sleep EEG. Evoked potentials in multiple-sclerosis. Oxygenation in infants. EGG of gastric empting (2). Subcutaneous adipose tissue. Nonstress tests in obstetrics. Bone dimeneralization.
Reference No. of subjects Clinical function Performance assessment Results Prostatic cancer Gamito et al, 2000 4,133 Prediction of risk of lymph node spread (LNS) from age, race, PSA, PSA velocity, Gleason sum and TNM External validation (n=660) 98% accuracy in detection of low risk of LNS with a MLP Cervical cancer Prismatic team, 1999 NNA Assessment as a primary screening tool for categorization of cervical smears as negative, mild, moderate or severe dyskaryosis, invasion, glandular neoplasia and borderline nuclear changes External validation (n=21,700) 89.9% agreement across all classes was found between PAPNET and conventional primary screening.Similar sensitivity (82 cf. 83%), with PAPNET having improved specificity (77 cf. 42%) and faster processing (3.9 min. cf. 10.4 min) Doornewaard et al 1999 NNA Assessment as a primary screening tool for the early detection of cervical dysplasia External validation (n=6,063) PAPNET testing has similar diagnostic value to conventional screening of Pap smears, with AUROC 95% CIs of 78-82% for control and 77-81% for PAPNET Mango et al, 1998 NNA Comparison of yield in re-screening of node-negative PAP smears between NNA and conventional unassisted cytology External validation (n=10,000) PAPNET returned a yield of 6.2% versus 0.6% for manual re-screening
Reference No. of subjects Clinical function Performance assessment Results Neonates Zernikow et al, 1999 2,144 Predicting length-of-stay in preterm neonates from 40 first-day-of-life items Train/test/ validation First-day-of-life data is predictive of length-of-stay of pre-term neonates with correlation CIs of 0.85-0.90 for MLR and 0.87-0.92 for MLP Ischaemia Polak et al, 1997 1,367 Prediction of transient ischaemia during ambulatory Holter monitoring, from a resting 12-lead ECG. Univariate t-tests were used to inform model selection Train/test LDA and adaptive logic networks were superior to the MLP to predict the likelihood for the occurrence of ischaemic episodes Selker et al, 1995 3,453 Clinical indicators available within 10 minutes of emergency department care were used to predict AMI and unstable angina pectoris, in a real-time clinical setting External validation (n=2,320) Limiting the inputs to 8 readily available variables, AUROCs for LogR, CART and MLP were 0.887, 0.858 and 0.902, respectively. Each is a clinically useful predictor of clinical outcome
Is there evidence of clinical benefit ? • Clinician performance patient outcome • Primary to secondary care referrals of patients with third molars:Sens. Spec. Acc. 1) Control group 0.97 0.22 0.83 2) Paper-based clinical algorithm 0.56 0.93 0.73 3) MLP-based recommendation 0.56 0.79 0.67 • Which is the best performing system ?
Is there evidence of clinical benefit ? • Clinician performance patient outcome • Primary to secondary care referrals of patients with third molars:Sens. Spec. Acc. 1) Control group 0.97 0.22 0.83 2) Paper-based clinical algorithm 0.56 0.93 0.73 3) MLP-based recommendation 0.56 0.79 0.67 • 1) 1.2 2) 8.0 3) 2.7
Performance estimation • ROC framework • Boost factor:PPV = True positives/Predicted positives
Predictive models Deficiencies in standard modelling methods: (Altman & Royston, Stat. Med. 2000) 1. Overoptimistic assessment of predictive performance 2. Multiple regression using stepwise variable selection 3. EPV < 10 (samples/parameters) 4. Case-mix (cohort variations) 5. External evaluation (protocol changes) • Retrospective vs. Temporal • Prospective vs. External
Continuum of inference models numeric to numeric numeric to symbolic symbolic to symbolic unsupervised supervised data driven knowledge driven neural networks signal processing statistical methods k-means clustering kernel methods inc. SVM FFT production rules SOM/GTM CART logistic regression control multi-layer perceptron ART axiomatic rule induction wavelets radial basis functions independent components analysis reinforcement learning fuzzy logic rule extraction
Systems level Requirements analysis & specification System evaluation Integration level Functional specification & data requirements Integration test Unit level Model design Test of model predictions Implementation & training Software life-cycle
Risk analysis Extract from the FDA guidelines
Pre-clinical: Rationale Animal models Phase I: Modelling Healthy humans Phase II: Exploratory trial Clinical trials Phase III: Definitive trial RCT Phase IV: Follow-up Post-market surveillance The continuum of evidence(Drug development)
Pre-clinical: Rationale Methodology Phase I: Modelling Retrospective studies Phase II: Exploratory trial Prospective studies Phase III: Definitive trial RCT Phase IV: Follow-up Post-market surveillance The continuum of evidence(Campbell et al, 2000)
Ph I: Theory Regularisation framework Ph II: Performance optimisation Complexity control Ph III: Generalisation HAZOP/FMEA RCT: case-control study Clinical effectiveness Methodology Retrospective studies Prospective studies RCT Post-market surveillance The continuum of evidence
Ph I: Theory Regularisation framework Ph II: Performance optimisation Complexity control Ph III: Generalisation HAZOP/FMEA RCT: case-control study Clinical effectiveness Essential requirements Performance validation Doctrine of Substantially Equiv. Products Model evaluation Requirement for Learned Intermediaries. Risk assessment Procedure for post-marketing surveillance H & S requirements The continuum of evidence Medical Devices Directives
Methodological issues arising • Confidence • Data • Regularisation • Calibration • Transparency • Rule-extraction • Linear-in-the-parameters statistical inference • Fuzzy or rule-based supervisory models • Effectiveness • Performance estimation • Reliability • Novelty-detection
Performance estimation • Power calculations • Bootstrap Ci:
Rule extraction • Axis parallel boxes & network pruning
Rule extraction • Axis parallel boxes & network pruning Lisboa, P.J.G., Etchells, T.A and Pountney, D.C. ‘Minimal MLPs do not model the XOR logic’ Neurocomputing, Rapid communication, 48, 1-4, 1033-1037, 2002.
Embodying a safety-culture Specification Statistically significant vs. clinically useful Transparency Verify against clinical prior knowledge HAZOP & FMEA Reliability - novelty detection ? Maintainability - incremental learning ? Good practice
Summary • Assuring confidence in complex is not a specific issue for neural networks but applies to all inference systems • A framework can be constructed based on a life-cycle model of safety-related software: • Good practice in the design data-based models • Need to switch between evidence-based and knowledge-based cultures for v & v.