1 / 22

Sample Size Selection for Microarray based Gene Expression Studies

Sample Size Selection for Microarray based Gene Expression Studies. Gregory R. Warnes, Pfizer Global R&D. Fasheng Li Smith Hanley Consulting Group. Outline. What is the context? What is the problem? What are possible approaches? What approach was chosen and why?

jolie
Download Presentation

Sample Size Selection for Microarray based Gene Expression Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sample Size Selection for Microarray based Gene Expression Studies Gregory R. Warnes, Pfizer Global R&D Fasheng Li Smith Hanley Consulting Group

  2. Outline • What is the context? • What is the problem? • What are possible approaches? • What approach was chosen and why? • How was the approach implemented? • What do the results look like? • Future plans? • References Industry/FDA Statistics Workshop: September 18-19, 2003

  3. What is Pfizer Global R&D? • What do we do? Lots! • Pharmaceutical research and development • Associated basic science, medical, and technological research • How are we doing? Very Well • 2003 R&D budget: $7.1 billion • 33 major research projects across 10 major therapeutic categories • 12,000 employees • 6 Major Research Sites Industry/FDA Statistics Workshop: September 18-19, 2003

  4. How are we using Gene Expression Technologies? • Determine regulatory and metabolic pathways • Identify potential biomarkers • Identify potential targets • Determine mechanism of action (desired and undesired) • Evaluate / predict safety • Determine mechanism of toxicity Industry/FDA Statistics Workshop: September 18-19, 2003

  5. What is the problem? • Gene expression assays are expensive • ~ $2,000 per samplefor Affymetrix experiments • Good experimental design is important • A huge number of variables measured on each experimental unit • 9,300 variables the Affymetrix S98 Yeast Genechip™ • 16,000 variables for Affymetrix RAE230a Rat Genechip™ • 23,000 + 23,000 = 46,000 variables for the Affymetrix U133A and U133B Human Genechips™ • Sample size calculations are hard Industry/FDA Statistics Workshop: September 18-19, 2003

  6. Standard sample size calculation For a single outcome variable, given • simple design (e.g., two-sample t-test) • effect size (ideally, minimum practical significance) • population variance ², • significance level(probability of a false positive when no true effect) • power(probability of a true positive given the defined effect size) It is straightforward to compute the required sample size n (see e.g. Cochrain & Cox (1957)) Industry/FDA Statistics Workshop: September 18-19, 2003

  7. Gene expression sample size calculation When there are thousands of outcome variables which are not independent, many problems arise: • How to handle multiple comparison? • How to deal with dependencies? • One effect size or many? • One power or many? • Many variables, how to get a single answer? Industry/FDA Statistics Workshop: September 18-19, 2003

  8. What are possible approaches? Two extremes: • Treat each variable (gene) as a separate and independent problem, then summarize + easy to set up, understand, explain + available data can be used - may not be sufficiently realistic, hence accuracy may suffer • Model the entire system, including realistic error structure and interdependencies +may be more accurate (if model is good) - more initial work to set up / compute - may require substantial new data to be realistic - May be hard to understand, explain Industry/FDA Statistics Workshop: September 18-19, 2003

  9. What approach was chosen and why? • We chose to treat each variable (gene) as a separate and independent problem, then summarize • Why? • First approximations usually yield a useful information with minimal effort. • Answers were needed immediately. • At best, results would only be used for general guidance • A more realistic error model didn’t work: We tried fitting the model from Zien, et al (2002), which requires high-dimensional numerical integration via MCMC or equivalent. However, the model appears to be non-identifiable. Industry/FDA Statistics Workshop: September 18-19, 2003

  10. How was the approach implemented? • Compute variance of each gene (variable) from existing studies • Assume a two sample t-test on log(expression) • Bonferonni adjust significance value: i =  / #variables • Generate plots of cumulative #genes : • Fixed I, , 1- vs. sample size (e.g. n=5/group,6/group,…) • Fixed I, , n vs. power (eg. 1-= 60%, 70%, 80%, …) • Fixed I, 1-, n vs. effect size (=1.5x, 2.0x, 2.5x, …) • Run twice: • ‘candidate’ genes ( less stringent Bonf. Adj.) • all genes • Implemented using R [Ross & Ihaka, 1996] using the power.t.test function. Industry/FDA Statistics Workshop: September 18-19, 2003

  11. What do the results look like? Standard Deviations: Focus Group Industry/FDA Statistics Workshop: September 18-19, 2003

  12. What do the results look like? Fixed I, , 1- vs. Sample Size:Focus Group Industry/FDA Statistics Workshop: September 18-19, 2003

  13. What do the results look like?Fixed I, , n vs. Power: Focus Group Industry/FDA Statistics Workshop: September 18-19, 2003

  14. What do the results look like?Fixed I, 1-, n vs. Fold Change: Focus Group Industry/FDA Statistics Workshop: September 18-19, 2003

  15. What do the results look like? Standard Deviations: All Genes Industry/FDA Statistics Workshop: September 18-19, 2003

  16. What do the results look like? Fixed I, , 1- vs. Sample Size:All Genes Industry/FDA Statistics Workshop: September 18-19, 2003

  17. What do the results look like?Fixed I, , n vs. Power: All genes Industry/FDA Statistics Workshop: September 18-19, 2003

  18. What do the results look like?Fixed I, 1-, n vs. Fold Change: All Genes Industry/FDA Statistics Workshop: September 18-19, 2003

  19. Future plans? • A web-applet backed by R to perform the calculations Industry/FDA Statistics Workshop: September 18-19, 2003

  20. Future plans? • Provide a web-applet backed by R to perform the calculations • Use a library of gene variation information in normal samples, (structured by organism, Affymetrix chip type, cell type, normalization/scaling method) • Extend to more complicated designs (2-way ANOVA, Repeated measures, etc) • Other types of multiple comparison adjustments (FDR) • Develop models that deal with correlations between genes. Industry/FDA Statistics Workshop: September 18-19, 2003

  21. References • Two-sample t-test sample size: • Cochrain WG, Cox GM (1953). Experimental Designs (2nd Ed). 17-28. • General sample size calculations: • Chow SC, Liu JP (1998). Design and Analysis of Clinical Trials : Concept and Methodologies. Wiley-Interscience. Chapter 10, 424 – 482 • Chow SC , Shao J, Wang H (2003). Sample Size Calculation in Clinical Research. Marcel Dekker [New, looks interesting] • Gene expression experiments sample size: • Zien A, Fluck J, Zimmer R, Lengauer T (2002). Microarrays: How Many Do You Need? RECOMB02, Meyers G, Hannenhalli S, Istrail S, Pevzner P, Waterman M, eds. 321-330. • Statistical analysis software: • Ihaka R, Gentleman R, et al (2003). http://www.r-project.org[web site] • Ross Ihaka and Robert Gentleman (1996). R: A Language for Data Analysis and Graphics, Journal of Computational and Graphical Statistics, Vol 5, Number 3: 299-314. • Web applet software: • Warnes GR, (2003). http://www.analytics.washington.edu/Zope/projects/RSessionDA/ [web site] • Me: • http://www.warnes.net Industry/FDA Statistics Workshop: September 18-19, 2003

  22. Finis

More Related