270 likes | 475 Views
Bayesian detection of non-sinusoidal periodic patterns in circadian expression data. Darya Chudova , Alexander Ihler , Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120. Outline. Introduction Methodology
E N D
Bayesian detection of non-sinusoidal periodic patterns in circadian expression data Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120
Outline • Introduction • Methodology • Experimental Results • Conclusion
Introduction • Cyclical biological processes : • Cell cycle, hair growth cycle, mammary cycle and circadian rhythms • Produce coordinated periodic expression of thousands of genes. • Existing computational methods are biased toward discovering genes that follow sine-wave patterns. • The objective is to identify or rank which of these genes are most likely to be periodically regulated.
Introduction • Two major categories : • Frequency domain • Compute the spectrum of the average expression profile for each probe. • Test the significance of the dominant frequency against a suitable null hypothesis such as uncorrelated noise. • Not well suited for short time courses. • Time domain • Identification of sinusoidal expression patterns • Simple and computational efficiency • Not effective at finding periodic signals which violate the sinusoidal assumption.
Introduction • In this article, a general statistical framework for detecting periodic profiles from time course • Analyzing the similarity of observed profiles across the cycles. • discover periodic transcripts of arbitrary shapes from replicated gen expression profiles. • Provide an empirical Bayes procedure for estimating parameters of the prior distribution. • Derive closed-formed expressions for the posterior probability of periodicity.
Introduction • Expression profiles from the murine liver time course data set. • Two of these probe sets (NrIdI and Arntl) correspond to well-established clock-control genes.
Methodology • Probabilistic mixture model: • Differentially expressed genes • change their expression level in response to changes in experimental conditions • Background genes • remains constant throughout the experiment • Coordinated expression across multiple cycles • Model periodic phenomena
Methodology • Mode the data using a mixture of three components for background, differentially and periodically expressed profiles. Compute the posterior probability that a given probe set was generated by the periodic component.
Methodology • A probabilistic model for periodicity • N probe sets over C cycles of known length. • Each cycle is represented by the same grid of T time points, indexed from 1 to T. • Denote the number of replicate observations for probe set at time point of cycle by . • : the expression intensity value for a particular probe set i , time point j and replicate k for cycle c. • : the entire set of observations for probe set i.
Methodology • Our probabilistic model for expression , then consists of three components : background(b), differentailly expressed but aperiodic (d) and periodically expressed profiles (p). • Let denote the component associated with probe set i. • Each of the three component models consists of Normal/Inverse Gamma (NIG) prior distribution on the latent profile and additional Normal noise on the observations.
Methodology • Normal/Inverse Gamma (NIG) prior is a flexible and computationally convenient distribution commonly used as a prior model for latent expression levels and replicate variability. • Scalar variables are distributed as NIG with parameters . • : inverse Gamma distribution with a degrees of freedom and scale parameters b, evaluated at x.
Methodology • Three type of unknown quantities: • The prior parameters, denoted • Determine via an empirical Bayesian procedure • Subsequently treated as known and fixed • Probe set-specific hidden variables: the latent profiles (consisting of a mean and variance) for each component. • The component identify , indicating from which component the data ware generated.
Methodology N probes sets, repeat Ntimes The observed profiles Yand latent variables Z (component identity) and {, }
Methodology • The background component model: • NIG prior shared by all background probe sets and parameterized by four scalars • Yi are modeled as independent samples from a Gaussian distribution with mean and variance
Methodology • The differentially expressed component model: • and be (C x T)-dimensional vector • The prior distribution for this component is defined by four (C x T) –dimensional parameters, • Mode observations as being independent given :
Methodology • The periodic component model: • Assume repeated expression of the same pattern across multiple cycles • and are T-dimensional variables encoding expression levels and replicate variability in the ‘ideal’ cycle.
Methodology • The complete set of prior parameters includes the prior component probabilities z (corresponding to the relative frequencies of background, differentially expressed, and periodic probe sets)
Methodology • Inference • Detect periodic expression by computing the posterior probability of the periodic component
Methodology • An analysis of variance periodicity detector • The resulting inferential test for periodicity is quite close to a simplified, non-Bayesian test based on analysis of variance (ANOVA). • Construct ANOVA test • Dividing the data into groups by their associated time points regardless of cycle number • All replicates for c=1,..,C and k=1,…, fall into the same group
Methodology • test whether the data support separation into these groups • whether the amount of variation between groups is significantly larger than the variation found within the groups. • High values of the ratio of these quantities indicated that most of the variability in observations can be explained using a time-dependent, cycle-independent profile,
Methodology • Estimating parameters of the prior distribution: • Develop an empirical Bayes procedure to determine the prior parameters • Determine a tentative assignment of probe set to each component • Use this assignment to find approximate maximum likelihood estimates of the location scale and parameter of the inverse Gamma distribution (a,b); we set the location mean to o in all three components.
Methodology • To find a tentative initial assignment of probe sets for estimating prior parameters: • Run ANOVA detector of differential expression and periodicity. • To define parameters of the component for differential expression • Probe sets that vary significantly over time (P<0.01) • To define the parameters of the background components: • Probe sets which fail this test (P>0.1) • probe sets for estimating the prior parameters of the periodic component • choosing those probe sets with P<0.001 results in a number of probe sets similar to that previously identified in the literature.
Experimental Results • Demonstrate the model can effectively identify both sinusoidal and non-sinusoidal periodic expression pattern. • It is widely believed that 5-10% of transcribed genes may be under circadian regulation, with some studies suggesting a higher proportion – up to 50%in murine liver. • The datasets analyzed in this article contain gene expression profiles of liver and skeletal muscle tissues in mice.
Experimental Results • Sine-wave detection: • Use the sine-wave matching algorithm of Straume (2004). • Identify 848 distinct rhythmic prove sets in liver and 383 such probe sets in skeletal muscle. • Model-based detection: • Among the top 25 probe sets there are nine that were not among the top 400 ranked by sine-wave matching. • Profile peak or drop at a single time point are poorly matched to a sinusoid shape.
Experimental Results • Tns3 is just the single probe set that ranked above 25 by the sin-wave method but below 400 by the model. • Conforms to the sine-wave pattern, but possesses a very small amplitude, and is assigned to the background component by the model. • All of the other probe sets that were so highly ranked by the sine-wave method received posterior probabilities of periodicity >0.9 from our model.
Conclusion • We argue that in typical experiments with only a small number of samples per cycle, we should test for arbitrary patterns which are repeated between cycles, rather than parametric shapes. • To this end, we propose a Bayesian mixture model for identifying patterns of unconstrained shape, which stand out as both differentially and periodically expressed.