Alex Lewin Centre for Biostatistics Imperial College, London

Mixture models for classifying differentially expressed genes Alex Lewin Centre for Biostatistics Imperial College, London Joint work with Natalia Bochkina, Sylvia Richardson BBSRC Exploiting Genomics grant

Modelling differential expression • Many different methods/models for differential expression • t-test • t-test with stabilised variances (EB) • Bayesian hierarchical models • mixture models • Choice whether to model alternative hypothesis or not • Our model: • Model the alternative hypothesis • Fully Bayesian

Mixture model features • Gene means and fold differences: linear model on the log scale • Gene variances: borrow information across genes by assuming exchangeable variances • Mixture prior on fold difference parameters • Point mass prior for ‘null hypothesis’

Fully Bayesian mixture model for differential expression H0 Explicit modelling of the alternative • 1st level yg1r | g, dg, g1  N(g – ½ dg , g12), yg2r | g, dg, g2  N(g + ½ dg , g22), • 2nd level gs2 | as, bs IG (as, bs) dg~ p0δ0 + p1G_(1.5, 1) + p2G+(1.5, 2) • 3rd level Gamma hyper prior for 1 , 2 , as, bs Dirichlet distribution for (p0, p1, p2)

Decision Rules • In full Bayesian framework, introduce latent allocation variable zg= 0,1 for gene g in null, alternative • For each gene, calculate posterior probability of belonging to unmodified component: pg = Pr( zg = 0 | data ) • Classify using cut-off on pg (Bayes rule corresponds to 0.5) • For any given pg , can estimate FDR, FNR. For gene-list S, est. (FDR | data) = Σg  S pg / |S|

Simulation Study Explore Explore performance of fully Bayesian mixture in different situations: • Non-standard distribution of DE genes • Small number of DE genes • Small number of replicate arrays • Asymmetric distributions of over- and under-expressed genes Simulated data, 50 simulated data sets for each of several different set-ups.

Simulation Study 2500 genes, 8 replicates in each experimental condition dg~ p0δ0 + p1 (f Unif() + (1 - f) N() ) + p2(f Unif() + (1 - f) N() ) gs ~ logNorm(-1.8, 0.5) ( logNorm based on data )

Non-standard distributions of DE genes f = 0.3 f = 0.5 f = 0.8 Av. est. π0 = 0.805 ± 0.010 Av. est. π0 = 0.797 ± 0.010 Av. est. π0 = 0.781 ± 0.010 π0 = 0.8 Gamma distributions superimposed

Small number of DE genes / Small number of replicate arrays 8 replicates Av. FDR = 7.0 % Av. FNR = 2.0 % Av. est. π0 = 0.947 ± 0.007 3 replicates Av. FDR = 17.9 % Av. FNR = 3.6 % Av. est. π0 = 0.956 ± 0.009 8 replicates Av. FDR = 9.2 % Av. FNR = 0.6 % Av. est. π0 = 0.990 ± 0.003 3 replicates Av. FDR = 17.6 % Av. FNR = 0.9 % Av. est. π0 = 0.995 ± 0.007 True π0 = 0.95 True π0 = 0.99

Asymmetric distributions of over/under-expressed genes dg~ p0δ0 + p1 (0.6 Unif( 0.01 , 1.7 ) + 0.4 N(1.7 , 0.8) ) + p2(0.6 Unif( -0.7 , -0.01 ) + 0.4 N( -0.7 , 0.8) ) True π0 = 0.9 True π1 = 0.09 True π2 = 0.01 Av. est. π0 = 0.897 ± 0.007 Av. est. π1 = 0.093 ± 0.003 Av. est. π2 = 0.011 ± 0.006

Additional Checks True FDR Est. FDR True FNR Est. FNR 1) FDR / FNR can be estimated well 2) Model works when there are no DE genes 50 simulations of same set-up: Av. est. π0 = 0.999 No genes are declared to be DE.

Comparison with conjugate mixture prior Replace dg~ p0δ0 + p1G_(1.5, 1) + p2G+ (1.5, 2) with dg~ p0δ0 + p1 N(0, cg2 ) NB: We estimate both c and p0 in fully Bayesian way.

Application to Mouse data Mouse wildtype (WT) and knock-out (KO) data (Affymetrix) ~ 22700 genes, 8 replicates in each WT and KO Gamma prior Est. π0 = 0.996 ± 0.001 Declares 59 genes DE

Summary • Good performance of fully Bayesian mixture model • can estimate proportion of DE genes in variety of situations • accurate estimation of FDR / FNR • Different mixture priors give similar classification results • Gives reasonable results for real data

Alex Lewin Centre for Biostatistics Imperial College, London

Alex Lewin Centre for Biostatistics Imperial College, London

Presentation Transcript

Imperial College London

Imperial College London

Welcome to Imperial College London

Imperial College London

Imperial College London

Ajit Kurup, Imperial College London.

Ajit Kurup, Imperial College London.

Jon Murray Imperial College London

Keith Smith, Imperial College, London.

Imperial College London

Alex Lewin (Imperial College Centre for Biostatistics) Ian Grieve ( IC Microarray Centre)

Sylvia Richardson Centre for Biostatistics Imperial College, London

Ajit Kurup, Imperial College London / Fermilab Ken Long, Imperial College London

Imperial College London

Sylvia Richardson, with Alex Lewin Department of Epidemiology and Public Health, Imperial College

Alex Bottle robert.bottle@imperial.ac.uk Imperial College London Dr Foster Unit

Heather Fry Centre for Educational Development Imperial College London h.fry@imperial.ac.uk

Sylvia Richardson Centre for Biostatistics Imperial College, London

Alex Lewin (Imperial College, Dept of Epidemiology) Sylvia Richardson ( IC Epidemiology)

Alex Lewin (Imperial College) Sylvia Richardson ( IC Epidemiology)

Imperial College London