370 likes | 519 Views
Structure and Uncertainty. Peter Green, University of Bristol, 10 July 2003. Statistics and science. “If your experiment needs statistics, you ought to have done a better experiment”. Ernest Rutherford (1871-1937). Graphical models. Mathematics. Modelling. Algorithms. Inference. Markov
E N D
Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003
Statistics and science “If your experiment needs statistics, you ought to have done a better experiment” Ernest Rutherford (1871-1937)
Graphical models Mathematics Modelling Algorithms Inference
Markov chains Spatial statistics Genetics Regression AI Statistical physics Sufficiency Covariance selection Contingency tables Graphical models
1. Modelling Mathematics Modelling Algorithms Inference
Structured systems A framework for building models, especially probabilistic models, for empirical data Key idea - • understand complex system • through global model • built from small pieces • comprehensible • each with only a few variables • modular
AB AO AO OO OO Mendelian inheritance - a natural structured model A AB A O O Mendel
Ion channelmodel model indicator transition rates hidden state Hodgson and Green, Proc Roy Soc Lond A, 1999 binary signal levels & variances data
model indicator C1 C2 C3 O1 O2 transition rates hidden state binary signal levels & variances data * * * * * * * * * * *
Gene expression using Affymetrix chips * * * * * Zoom Image of Hybridised Array Hybridised Spot Single stranded, labeled RNA sample Oligonucleotide element 20µm Millions of copies of a specific oligonucleotide sequence element Expressed genes Approx. ½ million different complementary oligonucleotides Non-expressedgenes 1.28cm Slide courtesy of Affymetrix Image of Hybridised Array
Gene expression is a hierarchical process • Substantive question • Experimental design • Sample preparation • Array design & manufacture • Gene expression matrix • Probe level data • Image level data
Mapping of rare diseases using Hidden Markov model Larynx cancer in females in France, 1986-1993 (standardised ratios) Posterior probability of excess risk G & Richardson, 2002
2. Mathematics Mathematics Modelling Algorithms Inference
C D F B E A Graphical models Use ideas from graph theory to • represent structure of a joint probability distribution • by encoding conditional independencies
Where does the graph come from? • Genetics • pedigree (family connections) • Lattice systems • interaction graph (e.g. nearest neighbours) • Gaussian case • graph determined by non-zeroes in inverse variance matrix
A B C D A B C D Inverse of (co)variance matrix: independent case A B C D
A B C D A B C D Inverse of (co)variance matrix: dependent case non-zero non-zero A B C D Few links implies few parameters - Occam’s razor
Conditional independence • X and Z are conditionally independent given Y if, knowing Y, discovering Z tells you nothing more about X: p(X|Y,Z) = p(X|Y) • X Z Y X Y Z
Conditional independence as seen in data on perinatal mortality vs. ante-natal care…. Does survival depend on ante-natal care? .... what if you know the clinic?
Conditional independence survival ante clinic survivaland clinicaredependent andanteandclinicaredependent but survival and ante are CI given clinic
C D F B E A Conditional independence provides a mathematical basis for splitting up a large system into smaller components
C D D F B E B E A
3. Inference Mathematics Modelling Algorithms Inference
Bayesian paradigm in structured modelling • ‘borrowing strength’ • automatically integrates out all sources of uncertainty • properly accounting for variability at all levels • including, in principle, uncertainty in model itself • avoids over-optimistic claims of certainty
Bayesian structured modelling • ‘borrowing strength’ • automatically integrates out all sources of uncertainty • … for example in forensic statistics with DNA probe data…..
4. Algorithms Mathematics Modelling Algorithms Inference
Algorithms for probability and likelihood calculations Exploiting graphical structure: • Markov chain Monte Carlo • Probability propagation (Bayes nets) • Expectation-Maximisation • Variational methods
Markov chain Monte Carlo • Subgroups of one or more variables updated randomly, • maintaining detailed balance with respect to target distribution • Ensemble converges to equilibrium = target distribution ( = Bayesian posterior, e.g.)
Markov chain Monte Carlo ? ? Updating - need only look at neighbours
form junction tree 267 236 3456 26 36 2 12 Probability propagation 5 7 6 4 1 2 3
Message passing in junction tree root root
Message passing in junction tree root root
Structured systems’ success stories include... • Genomics & bioinformatics • DNA & protein sequencing, gene mapping, evolutionary genetics • Spatial statistics • image analysis, environmetrics, geographical epidemiology, ecology • Temporal problems • longitudinal data, financial time series, signal processing
http://www.stats.bris.ac.uk/~peter P.J.Green@bristol.ac.uk …thanks to many