1 / 78

structure in models and data

A graphic account of. structure in models and data. Peter Green, University of Bristol RSS Manchester Local Group, 5 June 2002. What do I mean by structure?. The key idea is conditional independence : x and z are conditionally independent given y if p(x,z|y) = p(x|y)p(z|y)

Download Presentation

structure in models and data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A graphic account of structure in models and data Peter Green, University of BristolRSS Manchester Local Group, 5 June 2002

  2. What do I mean by structure? The key idea is conditional independence: x and z are conditionally independent given y if p(x,z|y) = p(x|y)p(z|y) … implying, for example, that p(x|y,z) = p(x|y) CI turns out to be a remarkably powerful and pervasive idea in probability and statistics

  3. How to represent this structure? • The idea of graphical modelling: we draw graphs in which nodes represent variables, connected by lines and arrows representing relationships • We separate logical (the graph) and quantitative (the assumed distributions) aspects of the model

  4. Contingency tables Markov chains Spatial statistics Genetics Graphical models Regression AI Statistical physics Sufficiency Covariance selection

  5. Graphical modelling [1] • Assuming structure to do probability calculations • Inferring structure to make substantive conclusions • Structure in model building • Inference about latent variables

  6. Basic DAG a b c in general: d for example: p(a,b,c,d)=p(a)p(b)p(c|a,b)p(d|c)

  7. Basic DAG a b c d p(a,b,c,d)=p(a)p(b)p(c|a,b)p(d|c)

  8. A natural DAG from genetics AB AO AO OO OO

  9. A O AB A O A natural DAG from genetics AB AO AO OO OO

  10. DNA forensics example(thanks to Julia Mortera) • A blood stain is found at a crime scene • A body is found somewhere else! • There is a suspect • DNA profiles on all three - crime scene sample is a ‘mixed trace’: is it a mix of the victim and the suspect?

  11. DNA forensics in Hugin • Disaggregate problem in terms of paternal and maternal genes of both victim and suspect. • Assume Hardy-Weinberg equilibrium • We have profiles on 8 STR markers - treated as independent (linkage equilibrium)

  12. DNA forensics in Hugin

  13. DNA forensics The data: 2 of 8 markers show more than 2 alleles at crime scene mixture of 2 or more people

  14. DNA forensics Population gene frequencies for D7S820 (used as ‘prior’ on ‘founder’ nodes): Hugin

  15. DNA forensics Results (suspect+victim vs. unknown+victim):

  16. How does it work? (1) Manipulate DAG to corresponding (undirected) conditional independence graph (draw an (undirected) edge between variables  and  if they are not conditionally independent given all other variables)   

  17. How does it work? (2) If necessary, add edges so it is triangulated (=decomposable)

  18. (3) Construct junction tree 5 7 6 4 1 2 3 a separator another clique a clique 267 236 3456 26 36 2 For any 2 cliques C and D, CD is a subset of every node between them in the junction tree 12

  19. (4) Probability propagation - passing messages around junction tree

  20. A=0 A=0 A=1 A=1 B=0 C=0 B=1 C=1 B=0 C=0 .7 1 C=0 B=0 .3 3/7 4/7 .1 B=0 B=0 3/4 3/4 1/4 1/4 B=1 C=1 .3 1 C=1 B=1 1/3 .4 .2 2/3 B=1 B=1 2/3 2/3 1/3 1/3 C A B C AB BC B B A A|B B|C Initialisation of potential representation

  21. A=0 A=1 C=0 C=1 B=0 B=0 1 .4 B=0 .3 .1 B=0 3/4 1/4 B=1 B=1 .6 1 B=1 .4 .2 B=1 2/3 1/3 A=0 A=1 B=0 3/4.4/1 1/4 .4/1 B=1 2/3 .6/1 1/3 .6/1 A B C AB BC B Passing message from BC to AB (1) marginalise multiply

  22. A=0 A=1 C=0 C=1 B=0 B=0 .4 .4 B=0 .3 .1 B=0 .3 .1 B=1 B=1 .6 .6 B=1 .4 .2 B=1 .4 .2 A=0 A=1 B=0 3/4.4/1 1/4 .4/1 B=1 2/3 .6/1 1/3 .6/1 A B C AB BC B Passing message from BC to AB (2) assign

  23. A=0 A=1 C=0 C=1 B=0 .4 B=0 .3 .1 B=0 .3 .1 B=1 .6 B=1 .4 .2 B=1 .4 .2 A B C AB BC B After equilibration - marginal tables

  24. Probabilistic expert systems: Hugin for ‘Asia’ example

  25. Limitations • of message passing: • all variables discrete, or • CG distributions (both continuous and discrete variables, but discrete precede continuous, determining a multivariate normal distribution for them) • of Hugin: • complexity seems forbidding for truly realistic medical expert systems

  26. Graphical modelling [2] • Assuming structure to do probability calculations • Inferring structure to make substantive conclusions • Structure in model building • Inference about latent variables

  27. Conditional independence graph draw an (undirected) edge between variables  and  if they are not conditionally independent given all other variables   

  28. Infant mortality example Data on infant mortality from 2 clinics, by level of ante-natal care (Bishop, Biometrics, 1969):

  29. Infant mortality example Same data broken down also by clinic:

  30. Analysis of deviance • Resid Resid • Df Deviance Df Dev P(>|Chi|) • NULL 7 1066.43 • Clinic 1 80.06 6 986.36 3.625e-19 • Ante 1 7.06 5 979.30 0.01 • Survival 1 767.82 4 211.48 5.355e-169 • Clinic:Ante 1 193.65 3 17.83 5.068e-44 • Clinic:Survival 1 17.75 2 0.08 2.524e-05 • Ante:Survival 1 0.04 1 0.04 0.84 • Clinic:Ante:Survival 1 0.04 0 1.007e-12 0.84

  31. Infant mortality example survival ante clinic survival and clinic aredependent and ante and clinic aredependent butsurvivaland ante are conditionally independent givenclinic

  32. Prognostic factors for coronary heart disease Analysis of a 26 contingency table (Edwards & Havranek, Biometrika, 1985) strenuous physical work? smoking? family history of CHD? blood pressure > 140? ratio of  and  lipoproteins >3? strenuous mental work?

  33. How does it work? Hypothesis testing approaches: Tests on deviances, possibly penalised (AIC/BIC, etc.), MDL, cross-validation... Problem is how to search model space when dimension is large

  34. How does it work? Bayesian approaches: Typically place prior on all graphs, and conjugate prior on parameters (hyper-Markov laws, Dawid & Lauritzen), then use MCMC (see later) to update both graphs and parameters to simulate posterior distribution

  35. 5 7 6 For example, Giudici & Green (Biometrika, 2000) use junction tree representation for fast local updates to graph 4 1 2 3 267 236 3456 26 36 2 12

  36. 5 7 6 4 1 2 3 267 236 3456 26 36 27 2 127 12

  37. Graphical modelling [3] • Assuming structure to do probability calculations • Inferring structure to make substantive conclusions • Structure in model building • Inference about latent variables

  38. DAG for a trivial Bayesian model   y

  39. Modelling with undirected graphs Directed acyclic graphs are a natural representation of the way we usually specify a statistical model - directionally: • disease  symptom • past  future • parameters  data ….. However, sometimes (e.g. spatial models) there is no natural direction

  40. Scottish lip cancer data The rates of lip cancer in 56 counties in Scotland have been analysed by Clayton and Kaldor (1987) and Breslow and Clayton (1993) (the analysis here is based on the example in the WinBugs manual)

  41. Scottish lip cancer data (2) The data include • the observed and expected cases (expected numbers based on the population and its age and sex distribution in the county), • a covariate measuring the percentage of the population engaged in agriculture, fishing, or forestry, and • the "position'' of each county expressed as a list of adjacent counties.

  42. Scottish lip cancer data (3) County Obs Exp x SMR Adjacent cases cases (% in counties agric.) 1 9 1.4 16 652.2 5,9,11,19 2 39 8.7 16 450.3 7,10 ... ... ... ... ... ... 56 0 1.8 10 0.0 18,24,30,33,45,55

  43. Model for lip cancer data (1) Graph regression coefficient covariate random spatial effects expected counts observed counts

  44. Model for lip cancer data (2) Distributions • Data: • Link function: • Random spatial effects: • Priors:

  45. WinBugs for lip cancer data • Bugs and WinBugs are systems for estimating the posterior distribution in a Bayesian model by simulation, using MCMC • Data analytic techniques can be used to summarise (marginal) posteriors for parameters of interest

  46. Bugs code for lip cancer data model { b[1:regions] ~ car.normal(adj[], weights[], num[], tau) b.mean <- mean(b[]) for (i in 1 : regions) { O[i] ~ dpois(mu[i]) log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * x[i] / 10 + b[i] SMRhat[i] <- 100 * mu[i] / E[i] } alpha1 ~ dnorm(0.0, 1.0E-5) alpha0 ~ dflat() tau ~ dgamma(r, d) sigma <- 1 / sqrt(tau) }

  47. Bugs code for lip cancer data model { b[1:regions] ~ car.normal(adj[], weights[], num[], tau) b.mean <- mean(b[]) for (i in 1 : regions) { O[i] ~ dpois(mu[i]) log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * x[i] / 10 + b[i] SMRhat[i] <- 100 * mu[i] / E[i] } alpha1 ~ dnorm(0.0, 1.0E-5) alpha0 ~ dflat() tau ~ dgamma(r, d) sigma <- 1 / sqrt(tau) }

  48. Bugs code for lip cancer data model { b[1:regions] ~ car.normal(adj[], weights[], num[], tau) b.mean <- mean(b[]) for (i in 1 : regions) { O[i] ~ dpois(mu[i]) log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * x[i] / 10 + b[i] SMRhat[i] <- 100 * mu[i] / E[i] } alpha1 ~ dnorm(0.0, 1.0E-5) alpha0 ~ dflat() tau ~ dgamma(r, d) sigma <- 1 / sqrt(tau) }

  49. Bugs code for lip cancer data model { b[1:regions] ~ car.normal(adj[], weights[], num[], tau) b.mean <- mean(b[]) for (i in 1 : regions) { O[i] ~ dpois(mu[i]) log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * x[i] / 10 + b[i] SMRhat[i] <- 100 * mu[i] / E[i] } alpha1 ~ dnorm(0.0, 1.0E-5) alpha0 ~ dflat() tau ~ dgamma(r, d) sigma <- 1 / sqrt(tau) }

More Related