370 likes | 383 Views
Explore the relationship between structure and uncertainty in statistics and science through graphical models, mathematical modeling, algorithms, and inference. Understand complex systems through global models built from small pieces. Discover the power of Bayesian structured modeling in dealing with uncertainty. Learn about algorithms for probability and likelihood calculations, as well as the use of Markov chain Monte Carlo, probability propagation, and message passing. Discover the success stories of structured systems in genomics, spatial statistics, and temporal problems.
E N D
Structure and Uncertainty Peter Green, University of Bristol, 10 July 2003
Statistics and science “If your experiment needs statistics, you ought to have done a better experiment” Ernest Rutherford (1871-1937)
Graphical models Mathematics Modelling Algorithms Inference
Markov chains Spatial statistics Genetics Regression AI Statistical physics Sufficiency Covariance selection Contingency tables Graphical models
1. Modelling Mathematics Modelling Algorithms Inference
Structured systems A framework for building models, especially probabilistic models, for empirical data Key idea - • understand complex system • through global model • built from small pieces • comprehensible • each with only a few variables • modular
AB AO AO OO OO Mendelian inheritance - a natural structured model A AB A O O Mendel
Ion channelmodel model indicator transition rates hidden state Hodgson and Green, Proc Roy Soc Lond A, 1999 binary signal levels & variances data
model indicator C1 C2 C3 O1 O2 transition rates hidden state binary signal levels & variances data * * * * * * * * * * *
Gene expression using Affymetrix chips * * * * * Zoom Image of Hybridised Array Hybridised Spot Single stranded, labeled RNA sample Oligonucleotide element 20µm Millions of copies of a specific oligonucleotide sequence element Expressed genes Approx. ½ million different complementary oligonucleotides Non-expressedgenes 1.28cm Slide courtesy of Affymetrix Image of Hybridised Array
Gene expression is a hierarchical process • Substantive question • Experimental design • Sample preparation • Array design & manufacture • Gene expression matrix • Probe level data • Image level data
Mapping of rare diseases using Hidden Markov model Larynx cancer in females in France, 1986-1993 (standardised ratios) Posterior probability of excess risk G & Richardson, 2002
2. Mathematics Mathematics Modelling Algorithms Inference
C D F B E A Graphical models Use ideas from graph theory to • represent structure of a joint probability distribution • by encoding conditional independencies
Where does the graph come from? • Genetics • pedigree (family connections) • Lattice systems • interaction graph (e.g. nearest neighbours) • Gaussian case • graph determined by non-zeroes in inverse variance matrix
A B C D A B C D Inverse of (co)variance matrix: independent case A B C D
A B C D A B C D Inverse of (co)variance matrix: dependent case non-zero non-zero A B C D Few links implies few parameters - Occam’s razor
Conditional independence • X and Z are conditionally independent given Y if, knowing Y, discovering Z tells you nothing more about X: p(X|Y,Z) = p(X|Y) • X Z Y X Y Z
Conditional independence as seen in data on perinatal mortality vs. ante-natal care…. Does survival depend on ante-natal care? .... what if you know the clinic?
Conditional independence survival ante clinic survivaland clinicaredependent andanteandclinicaredependent but survival and ante are CI given clinic
C D F B E A Conditional independence provides a mathematical basis for splitting up a large system into smaller components
C D D F B E B E A
3. Inference Mathematics Modelling Algorithms Inference
Bayesian paradigm in structured modelling • ‘borrowing strength’ • automatically integrates out all sources of uncertainty • properly accounting for variability at all levels • including, in principle, uncertainty in model itself • avoids over-optimistic claims of certainty
Bayesian structured modelling • ‘borrowing strength’ • automatically integrates out all sources of uncertainty • … for example in forensic statistics with DNA probe data…..
4. Algorithms Mathematics Modelling Algorithms Inference
Algorithms for probability and likelihood calculations Exploiting graphical structure: • Markov chain Monte Carlo • Probability propagation (Bayes nets) • Expectation-Maximisation • Variational methods
Markov chain Monte Carlo • Subgroups of one or more variables updated randomly, • maintaining detailed balance with respect to target distribution • Ensemble converges to equilibrium = target distribution ( = Bayesian posterior, e.g.)
Markov chain Monte Carlo ? ? Updating - need only look at neighbours
form junction tree 267 236 3456 26 36 2 12 Probability propagation 5 7 6 4 1 2 3
Message passing in junction tree root root
Message passing in junction tree root root
Structured systems’ success stories include... • Genomics & bioinformatics • DNA & protein sequencing, gene mapping, evolutionary genetics • Spatial statistics • image analysis, environmetrics, geographical epidemiology, ecology • Temporal problems • longitudinal data, financial time series, signal processing
http://www.stats.bris.ac.uk/~peter P.J.Green@bristol.ac.uk …thanks to many