1 / 27

Division of Human Cancer Genetics Ohio State University

Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays. William J. Lemon, Jeffrey J.T. Palatini, Ralf Krahe, Fred A. Wright. Division of Human Cancer Genetics Ohio State University. polyA. Coding portion of gene X. Perfect Match (PM)

cher
Download Presentation

Division of Human Cancer Genetics Ohio State University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays William J. Lemon, Jeffrey J.T. Palatini, Ralf Krahe, Fred A. Wright Division of Human Cancer GeneticsOhio State University

  2. polyA Coding portion of gene X Perfect Match (PM) Mismatch (MM) ... PM - 25 bases complementary to region of gene MM - Middle base is different Measuring gene expression with the Affymetrix GeneChip • cRNA from sample mRNA is put on the chip • intensity of binding reflects gene expression

  3. Reproducibility of Probe Sensitivities Li, C and Wong, WH, Proc. Natl. Acad. Sci. USA, 98:31-36, 2001.

  4. The Li-Wong Model Li-Wong Full (LWF) Identifiability constraint Li-Wong Reduced (LWR) Li, C and Wong, WH, Proc. Natl. Acad. Sci. USA, 98:31-36, 2001.

  5. The Li-Wong Model Li-Wong Full (LWF) ith array jth probe pair Identifiability constraint Total no. probe pairs Li-Wong Reduced (LWR) Li, C and Wong, WH, Proc. Natl. Acad. Sci. USA, 98:31-36, 2001.

  6. The Li-Wong Model Li-Wong Full (LWF) ith array jth probe pair Identifiability constraint Total no. probe pairs Li-Wong Reduced (LWR) Li, C and Wong, WH, Proc. Natl. Acad. Sci. USA, 98:31-36, 2001. expression sensitivities

  7. How to compare gene expression indexes? • We get maximum likelihood estimates for q using either full data (LWF) or reduced data (LWR) • The Affymetrix software computes: • Average Difference (AD) • Log-Average (LA) • We gain insight by assuming Li-Wong model is true. Then what are the consequences? • For large sample sizes, the a’s and f’s will be well-estimated

  8. Compare LW estimators directly: Comparing to AD is tricky, but with a correction factor AD is also an unbiased estimate of q :

  9. This also gives insight into “perfect match only” analyses: • RE(full, PM-only)= and Furthermore, PM-only is always at least twice as efficient as LWR

  10. Empirical Comparisons • We propose that an expression index is “good” if it has a high correlation with the underlying true expression (which is usually unknown). • this correlation can be estimated using a specially designed mixing experiment • if r is the correlation coefficient between the measured index and true expression, the “relative efficiency” of two indexes q and h can be estimated as

  11. Experimental Design (6 replicates for each condition) Human Fibroblasts (GM 08330) 20% FBS Cell culture 5 passages 20% 0.1% FBS Serum starvation 48h 0.1% 20% FBS Harvest total RNA Serum stimulation 24h Harvest total RNA RNA extraction Produce 50:50 group Starved 50:50 Stimulated Produce duplicates each day for 3d Dap, Thr, Lys, Phe Add Bacterial Control Genes Dap, Thr Lys, Phe Synthesize cDNA, cRNA; fragment BioB, BioC, BioD, Cre Add Hybridization Control Genes HuGeneFL Hybridize Gene Expression Indexes Data Reduction

  12. Overall intensity higher in Stimulated Mean probe intensity per array Stim 50:50 Starved

  13. BIN1 expression Stim Starved 50:50 True expression = average of Stim, Starved

  14. Coefficients of variation for assay (individual probes) and gene expression indexes

  15. Correlation matrix of 18 arrays as a colorized image for each expression index. Starved LWR LWF 50:50 Stim Starved AD LA 50:50 Stim Stim 50:50 Starved Stim 50:50 Starved

  16. Comparing ModelsCluster Analysis Strv 3 Strv 4 Strv 6 Strv 5 Strv 2 Strv 1 Stim 2 Stim 1 Stim 4 Stim 5 Stim 6 Stim 3 50:50 5 50:50 4 50:50 2 50:50 1 50:50 6 50:50 3 Strv 1 Strv 4 Strv 2 Strv 5 Strv 3 Strv 6 50:50 3 50:50 5 50:50 4 50:50 2 50:50 1 50:50 6 Stim 4 Stim 6 Stim 5 Stim 3 Stim 1 Stim 2 Full Model Reduced Model Strv 2 Strv 3 Strv 1 Strv 6 Strv 5 Strv 4 Stim 2 Stim 4 50:50 1 Stim 1 Stim 6 Stim 3 Stim 5 50:50 3 50:50 5 50:50 4 50:50 2 50:50 6 Stim 2 Strv 1 Strv 3 Strv 2 Strv 6 Strv 5 Strv 4 Stim 1 Stim 6 Stim 3 Stim 5 Stim 4 50:50 5 50:50 4 50:50 3 50:50 2 50:50 1 50:50 6 Affymetrix Ave Diff Affymetrix Log Ave

  17. Relative Efficiency Unscaled Scaled Median(r2/(1-r2)) LA LA AD AD LWF LWF LWR LWR

  18. Correlation of duplicate measurements of 149 genes LWF median r=.74 LWR median r=.43 LWF median r=.08 LWF median r=.17

  19. Number of unexpressed genes • Only 0.2% of the LW estimates are negative • 50:50 group has fewest negative estimates • could this indicate very few unexpressed genes? Starved Stim 50:50

  20. A conservative approach to estimating number of unexpressed genes • Let U denote number of unexpressed genes • genes are ranked according to expression index • This is useful if we can get a random sample of unexpressed genes Unexpressed population Gene expression index

  21. We use the spiked-out bacterial control genes as a sample of “unexpressed” genes • the 4 genes are are represented 3 times each (different portions of mRNA), for a total of 12 probe sets • Based on this reasoning, we estimate that greater than 88% of the genes are expressed, even in the Starved samples

  22. AD LWF Very low estimated expression for truly absent genes when using LWF Rank of expression index variance across the 6 Stimulated arrays versus rank of index mean Truly absent in stim group

  23. Present/absent calls • We use the statistic • to declare genes present/absent (absolute call) • we find the vast majority of genes on the array appear to be present • for the spiked in/out genes, we find vastly improved present/absent calling using LW estimates

  24. ROC curve - spiked in/out genes LWF-Z LWR-Z Untrimmed AD LA Untrimmed LA AD Absolute Call

  25. Variability in estimates Reduced Model Full Model Stim log(variance) 50:50 Starved log(mean)

  26. Conclusions • Model-based estimators are superior to simple averaging • we have demonstrated this using both analytic considerations and experimental data • a carefully designed experiment can be used to address many issues • Many more genes may be expressed than previously thought

  27. Other issues/ future work • Spiking genes might be used to calibrate and normalize arrays • relationship between variance and mean of expression indexes may be useful in planning experiments • our data may be useful for future work, especially in producing indexes that are resistant to probe saturation • all primary data, this Powerpoint presentation and a preprint are available at http://thinker.med.ohio-state.edu

More Related