Simultaneous Normalization and Differential Expression: A Bayesian Approach

Simultaneous Normalization and Differential Expression Alex Lewin Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina (IC Epidemiology) Helen Causton (IC Microarray Centre) Peter Green and Graeme Ambler (Bristol)

Expression level dependent normalization Many gene expression data sets need normalization which depends on expression level. Usually normalization is performed in a pre-processing step before the model for differential expression is used. These analyses ignore the fact that the expression level is measured with variability. Ignoring this variability leads to bias in the function used for normalization.

Simultaneous normalization and differential expression We propose a Bayesian model which includes array effects (normalization) in the differential expression model. Show (on simulated data) that ignoring the variability in the expression level leads to a greater number of false positives.

Bayesian hierarchical model for differential expression Data: ygsr = log gene expression for gene g, replicate r g = gene effect δg = differential effect for gene g between 2 conditions r(g)s = array effect (expression-level dependent) gs2 = gene variance • 1st level yg1r  N(g – ½ δg + r(g)1 , g12), yg2r  N(g + ½ δg + r(g)2 , g22), Σrr(g)s = 0, r(g)s = function of g , parameters {a} and {b} • 2nd level Priors for gδg, coefficients {a} and {b} gs2  lognormal (μs, τs)

Details of array effects (Normalization) Piecewise polynomial with unknown break points: r(g)s = quadratic in g for ars(k-1)≤ g ≤ ars(k) with coeff (brsk(1),brsk(2) ), k =1, … #breakpoints Locations of break points not fixed Must do sensitivity checks on # break points Cubic fits well for the data we are interested in

Mouse Data 3 wildtype (normal) mice compared with 3 mice with Cd36 knocked out 3 replicate arrays (wildtype mouse data) Model: posterior means E(r(g)s | data) v. E(g | data) Data:ygsr - E(g | data)

Simulated Data • 1000 genes with 3 replicates under 2 conditions • Expression levels g between 0 and 10 (log scale) • g12  log Normal (-1.8,1), g22  log Normal (-2.2,1) • 900 genes: δg= 0 • 50 genes: δg N( log(3), 0.12) • 50 genes: δg N( -log(3), 0.12) • Array effects r(g)s cubic functions of g

Array Effects and Variability for Simulated Data

Two-step method • Use loess smoothing to obtain array effects loessr(g)s • Subtract loess array effects from data: yloessgsr = ygsr - loessr(g)s • Run our model on yloessgsrwith no array effects

Two-step method • yloessgsr = ygsr - loessr(g)s • ymodelgsr = ygsr - E(r(g)s | data) • Results from 2 different two-step methods are much closer to each other than to full model results.

Decision rules for selecting differentially expressed genes If P(δg > δcut | data) > pcut then gene g is called differentially expressed. We used δcut= log(3) – corresponds to null hypothesis. Various pcut – choose this according to acceptable error rate (e.g. False Discovery Rate).

Full model v. two-step method Plot observed False Discovery Rate against pcut (averaged over 5 simulations) Solid line for full model Dashed line for pre-normalized method

Discussion • More false positives if normalization carried out in a pre-processing step. • Larger slope of array effects – larger difference between full and pre-normalized models • Lewin, A., Richardson, S., Marshall C., Glazier A. and Aitman T. (2004) Bayesian Modelling of Differential Gene Expression. (under revision), available at http ://www.bgx.org.uk/

Simultaneous Normalization and Differential Expression: A Bayesian Approach

Simultaneous Normalization and Differential Expression: A Bayesian Approach

Presentation Transcript

London Regional Genomics Centre

Stephen Fisher, Jane Holmes, Nicky Best, Sylvia Richardson

Alex Lewin (Imperial College Centre for Biostatistics) Ian Grieve ( IC Microarray Centre)

Innovation in the Public Sector Stephen Edwards A/Government

Introduction

Review of the Information Centre Survey Programme

Life Sciences: Data Revolution

Henrik Bengtsson hb@maths.lth.se Mathematical Statistics Centre for Mathematical Sciences

Guangquan Li * , Robert Haining + , Sylvia Richardson * and Nicky Best *

Kurt Lewin

MICROARRAY TECHNOLOGY

Robust microarray experiments by design: a multiphase framework

Sylvia Richardson, with Alex Lewin Department of Epidemiology and Public Health, Imperial College

Henrik Bengtsson hb@maths.lth.se Mathematical Statistics, Centre for Mathematical Sciences,

The Finnish Microarray and Sequencing Centre

Sylvia Richardson sylvia.richardson@mrc-bsum.ac.uk

Gene Expression Data

Some views on microarray experimental design

Annotation and Analysis of Microarray Data A primer for NERC researchers

Microarray

MICROARRAY TECHNOLOGY