270 likes | 425 Views
(2) Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent. Outline. Introduction Ratio Statistics
E N D
(2) Ratio statistics of gene expression levels and applications to microarray data analysis Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent
Outline • Introduction • Ratio Statistics • Quality Metric for Ratio Statistics • Conclusion
Introduction • Motivation Expression-based analysis for large families of genes has recently become possible owing to the development of cDNA microarrays, which allow simultaneous measurement of transcript levels for thousands of genes. For each spot on a microarray, signals in two channels must be extracted from their backgrounds. This requires algorithms to extract signals arising from tagged mRNA hybridized to arrayed cDNA locations and algorithms to determine the significance of signal ratios.
Introduction • Results 1.estimation of signal ratios from the two channels, and the significance of those ratios. 2. a refined hypothesis test is considered in which the measured intensities forming the ratio are assumed to be combinations of signal and background. The new method involves a signal-to-noise ratio, and for a high signal-to-noise ratio the new test reduces (with close approximation) to the original test. The effect of low signal-to-noise ratio on the ratio statistics constitutes the main theme of the paper. 3. a quality metric is formulated for spots
Ratio Statistics assuming a constant coefficient of variation • Consider a microarray having n genes, with red and green fluorescent expression values labeled by and , respectively. • Hypothesis test: • Assumption:
Ratio Statistics assuming a constant coefficient of variation (cont.) • Ratio test statistics: • Assuming and to be normally and identically distributed, has the density function
Ratio Statistics assuming a constant coefficient of variation (cont.) • self-self experiment • Duplicate
Ratio Statistics assuming a constant coefficient of variation (cont.) • Confidence interval 1. Integrating the ratio density function 2. The C.I. is determined by the parameter c, one can either use the par. derived from pre-selected housekeeping genes or a set of duplicate genes.
Ratio Statistics for low signal-to-noise ratio • The actual expression intensity measurement is of the form
Ratio Statistics for low signal-to-noise ratio (cont.) • Null hypothesis of interest: test statistics:
Ratio Statistics for low signal-to-noise ratio (cont.) • Major difference: 1. the assumption of a constant cv applies to and , not to and 2. the density of is not applicable • SNR (signal-to-noise ratio)
SNR (signal-to-noise ratio) • Assuming that are independent,
Confidence interval for the test statistics • Assumption:
Confidence interval for the test statistics (cont.) • Under the assumption of constant cv for the signal (without the background),
Correction of background estimation • Owing to interaction between the fluorescent signal and background, local-background estimation is often biased. • To estimate the bias difference, we find the relationship between the red and green intensities under the null hypothesis by assuming a linear relation, G = aR+b.
Correction of background estimation (cont.) • Simulation 1. generate 10,000 data points from exp. dist. with 2,000 to simulate 10,000 gene expression levels, 2. The intensity measurement for each channel is further simulated by using a normal dist. with mean intensity from the exp. dist. and a constant cv of 0.2 3. simulate background level by a normal dist. (1) no bias: background level ~ N (0,100) (2) some bias: background level ~ N (b,100)
Scatter plot of simulated expression data dog-leg effect
Correction of background estimation (cont.) • G = aR+b we employ a chi-square fitting method that minimizes
Quality Metric for Ratio Statistics • For a given cDNA target, the following factors affect ratio measurement quality: • Weak fluorescent intensities • A smaller than normal detected target area • A very high local background level • A high standard deviation of target intensity
(1)Fluorescent intensity measurement quality • Under the null hypothesis, the signal means are equal, so that
(3)Background flatness quality • Define background flatness
Typical target shap cv=0.48 cv=0.81 cv=0.45 cv=0.98 cv=0.31 cv=0.59 (4)Signal intensity consistency quality