690 likes | 810 Views
Analysis of time-course gene expression data. Shyamal D. Peddada Biostatistics Branch National Inst. Environmental Health Sciences (NIH) Research Triangle Park, NC. Outline of the talk. Some objectives for performing “long series” time-course experiments Single cell-cycle experiment
E N D
Analysis of time-course gene expression data Shyamal D. PeddadaBiostatistics Branch National Inst. Environmental Health Sciences (NIH)Research Triangle Park, NC
Outline of the talk • Some objectives for performing “long series” time-course experiments • Single cell-cycle experiment • A nonlinear regression model • Phase angle of a cell cycle gene • Inference • Open research problems • Multiple cell-cycle experiments • “Coherence” between multiple cell-cycle experiments • Illustration • Open research problems
Objectives Some genes play an important role during the cell division cycle process. They are known as “cell-cycle genes”. Objectives: Investigate various characteristics of cell-cycle and/or circadian genes such as: • Amplitude of initial expression • Period • Phase angle of expression (angle of maximum expression for a cell cycle gene)
A brief description • G1 phase: "GAP 1". For many cells, this phase is the major period of cell growth during its lifespan. • S ("Synthesis”) phase: DNA replication occurs.
A brief description • G2 phase: "GAP 2“: Cells prepare for M phase. The G2 checkpoint prevents cells from entering mitosis when DNA was damaged since the last division, providing an opportunity for DNA repair and stopping the proliferation of damaged cells. • M (“Mitosis”) phase: Nuclear (chromosomes separate) and cytoplasmic (cytokinesis) division occur. Mitosis is further divided into 4 phases.
Whitfield et al.(Molecular Biology of the Cell, 2002) Basic design is as follows: • Experimental units: Human cancer cells (HeLa) • Microarray platform: cDNA chips used with approx 43000 probes (i.e. roughly 29000 genes) • 3 different patterns of time points (i.e. 3 different experiments) One of the goals of these experiments was to identify periodically expressed genes.
Whitfield et al.(Molecular Biology of the Cell, 2002) Experiment 1: (26 time points) Hela cancer cells arrested in the S-phase using double thymidine block. • Sampling times after arrest (hrs): • 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 16 18 20 22 24 26 28 32 36 40 44.
Whitfield et al. (2002) Experiment 2: (47 time points) Hela cancer cells arrested in the S-phase using double thymidine block. • Sampling times after arrest (hrs): • every hour between 0 and 46.
Whitfield et al. (2002) Experiment 3: (19 time points) Hela cancer cells arrested arrested in the M-phase using thymidine and then by nocodazole. • Sampling times after arrest (hrs): • 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36.
Whitfield et al. (2002)Phase marker genes: Cell Cycle Phase Genes ------------------ ------- G1/S CCNE1, CDC6, PCNA,E2F1 S RFC4, RRM2 G2 CDC2, TOP2A, CCNA2, CCNF G2/M STK15, CCNB1, PLK, BUB1 M/G1 VEGFC, PTTG1, CDKN3, RAD21
Questions • Can we describe the gene expression of a cell-cycle gene as a function of time? • Can we determine the phase angle for a given cell-cycle gene? i.e. can we quantify the previous table in terms of angles on a circle? • What is the period of expression for a given gene? • Can we test the hypothesis that all cell-cycle genes share the same time period? • Etc.
Some important observations • Gene expression has a sinusoidal shape • Gene expression for a given gene is an average value of mRNA levels across a large number of cells • Duration of cell cycle varies stochastically across cells • Initially cells are synchronized but over time they fall out of synchrony • Gene expression of a cell-cycle gene is expected to “decrease/decay” over time. This is because of items 2 and 4 listed above!
Random Periods Model (PNAS, 2004) • a and b: background drift parameters • K: the initial amplitude • T: the average period • the attenuation parameter • the phase angle
Whitfield et al. (2002)Phase marker genes: Phase Genes Phase angles (radians) -------- ------- ------------------------ G1/S CCNE1, CDC6, PCNA,E2F1 0.56, 5.96, 5.87, 5.83 S RFC4, RRM2 5.47, 5.36 G2 CDC2, TOP2A, CCNA2, CCNF 4.24, 3.74, 3.55, 3.25 G2/M STK15, CCNB1, PLK, BUB1 3.06, 2.67, 2.61, 2.51 M/G1 VEGFC, PTTG1, CDKN3, RAD21 2.66, 2.40, 2.25, 1.81
A hypothesis of biological interest Do all cell cycle genes have same T and same but the other 4 parameters are gene specific? i.e.
An Important Feature • Correlated data • Temporal correlation within gene • Gene-to-gene correlations
Test Statistic • Wald statistic for heteroscedastic linear and non-linear models • Zhang, Peddada and Rogol (2000) • Shao (1992) • Wu (1986)
The Null Distribution • Due to the underlying correlation structure • Asymptotic approximation is not appropriate. • Use moving-blocks bootstrap technique on the residuals of the nonlinear model. • Kunsch (1989)
Moving-blocks Bootstrap • Step 1: Fit the null model to the data and compute the residuals. • Step 2: Draw a simple random sample (with replacement) from all possible blocks , of a specific size, of consecutive residuals.
Moving-blocks Bootstrap • Step 3: Add these residuals to the fitted curve under the null hypothesis to obtain the bootstrap data set • Step 4: Using the bootstrap data fit the model under the alternate hypothesis and compute the Wald statistic.
Moving-blocks Bootstrap • Step 5: Repeat the above steps a large number of times. • Step 6: The bootstrap p-value is the proportion of the above Wald statistics that exceed the Wald statistic determined from the actual data.
Analysis of experiment 2 • The bootstrap p-value for testing using Experiment 2 data of Whitfield et al. (2002) is 0.12. Thus our model is biologically plausible.
Statistical inferences on the phase angle Multiple experiments
Some questions of interest • How to evaluate or combine results from multiple cell division cycle experiments? • Are the results “consistent” across experiments? • How to evaluate this? • What could be a possible criterion?
Data : RPM estimate of phase angle of a cell-cycle gene ‘g’ from the experiment.
Representation using a circle Consider 4 cell cycle genes A, B, C, D. The vertical line in the circle denotes the reference line. The angles are measured in a counter-clockwise. Thus the sequential order of expression in this example is A, B, D, C. A B C D
“Coherence” in multiple cell-cycle experiments • A group of cell cycle genes are said to be coherent across experiments if their sequential order of the phase angles is preserved across experiments. B A D B Exp 2 D A C D C C Exp 3 B A Exp 1
Geometric Representation • We shall represent phase angles from multiple cell cycle experiments using concentric circles. • Each circle represents an experiment. • Same gene from a pair of experiments is connected by a line segment. • A figure with non-intersecting lines indicates perfect coherence. • If there is no coherence at all then there will be many intersecting lines.
Estimated Phase Angles • Due to statistical errors in estimation, the estimated phase angles from multiple cell cycle experiments need not preserve the sequential order even though the true phase angles are in a sequential order.
Experiment B Experiment A Question: Can we determine a rotation matrix A such that we can rotate the circle representing Experiment A to obtain the circle representing Experiment B?
Angle of rotation for a rigid body • Yes! By solve the following minimization problem:
The Basic Idea • Consider a rigid body rotating in a plane. Suppose the body is perfectly rigid with no deformations. • Let denote the 2x2 rotation matrices from experiment i to i+1 (k+1 = 1). Then Alternatively
The Basic Idea • Equivalently, if Then under perfect rigid body motion we should have
Problem! • In the present context we do NOT necessarily have a rigid body! • Not all experiments are performed with same precision. • The time axis may not be constant across experiments. • Number of time points may not be same across experiments. • Etc.
Consequence • Rotation matrix A alone may not be enough to bring two circles to congruence! • An additional “association/scaling” parameter may be needed as see in the previous figure!
Circular-Circular regression model for a pair of experiments (Downs and Mardia, 2002) • For , let denote a pair of angular variables. • Suppose is von-Mises distributed with mean direction and concentration parameter