1 / 31

Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

Statistical Considerations in High-Throughput RNAi Screens f or Identifying Genes Mediating Sensitivity to Chemotherapeutic Drugs. Fei Ye, PhD Vanderbilt Center for Quantitative Sciences Department of Biostatistics, Vanderbilt School of Medicine CQS Summer Institute 2014. Background.

Download Presentation

Fei Ye, PhD Vanderbilt Center for Quantitative Sciences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Considerations in High-Throughput RNAiScreens for Identifying Genes Mediating Sensitivity to Chemotherapeutic Drugs Fei Ye, PhD Vanderbilt Center for Quantitative Sciences Department of Biostatistics, Vanderbilt School of Medicine CQS Summer Institute 2014

  2. Background • RNA interference (RNAi): a valuable tool for modulating gene expression through the introduction of short interfering RNAs including small interfering RNAs (siRNAs) and short hairpin RNAs (shRNAs). • RNAi has emerged as a powerful technology to knock down specific genes for functional analysis and for therapeutic purposes, particularly because we now know much more about specific genes involved in disease processes. • RNAi studies conducted with human tumor cell lines using synthetic siRNAs/shRNAs targeting defined gene families or genomic-wide libraries have identified modulators of drug sensitivity (hits). • Large-scale systematic RNAi screens aim to test hundreds, or even thousands, of siRNAs/ shRNAs to identify hits.

  3. What is RNA interference (RNAi)?

  4. Challenges in analyzing high-throughput RNAi data • To identify efficiently and accurately genes that, when lost, significantly reduce or increase cell viability in response to chemo treatment. • Reduce false positives and false negatives - in the wet lab: (1) technical and procedural improvements; (2) increase the number of replicate measurements - in the statistical analysis: (1) control type I error; (2) sufficient power; (3) combined effects of RNAi and chemotheraputic drug.

  5. RNAi screening approach identifies genes that increase sensitivity to Paclitaxelin breast cancer cells

  6. Real data analysis - background • Paclitaxel is a potent anti-microtubule agent used in the treatment of patients with locally advanced and metastatic breast cancer. Despite its wide use, paclitaxel-based chemotherapy results in full response in only a small portion of patients; many patients have an incomplete response or are resistant to treatment.

  7. Real data analysis – material and methods • Performed a loss-of-function RNAi screen to identify genes that modulate paclitaxel sensitivity. • Targeted a subset of genes (n=428) frequently found to be ”deregulated” in breast cancers and known to be associated with a targeted pharmacological agent. • Selected 36 candidate genes by screening, and designed two independent siRNAs for each of these genes to validate in two breast cancer cell lines, MDA-MB-231 and MDA-MB-468.

  8. shRNA screen transfect w/ a subset of the human genome pGIPZshRNAmir plasmid library (n=1078) divide Vehicle control (DMSO) Incubate72-96 h 5 nMpaclitaxel ~20,000 cells

  9. siRNA screen divide reverse- transfect Vehicle control (DMSO) Breast cancer cell lines MDA-MB-231 MDA-MB-468 Entirely experiment was performed in triplicate. 5 nMpaclitaxel

  10. Real data analysis – preprocessing and normalization • Sources of noise. • Ideally, mixed-effects models should be used to estimate both fixed effects (RNAi and treatment) and random effects (batch, plate, day, etc.). • Baseline correction. • Within-plate normalization: data from each plate were normalized to a non-silencing (NS) control, to give a relative measurement of target-gene knockdown effect and to control for the effects of siRNAtransfection. Other approaches: ‘%control’, ‘normalized %inhibition’, Z score, B score... • Across-plate normalization: make measurements comparable across culture plates by removing systematic plate-to-plate variation. Approaches: median/mean centering, standardization methods, etc.

  11. Robustness? - Well-well variation replicate plates controls shRNA/siRNA Within-plate normalization?

  12. Replicability? - Plate-plate variation replicate plates controls shRNA/siRNA Global normalization?

  13. Reproducibility? - Experimental variation *Non-targeting siRNA control- does not target any mRNA in genome %cell growth = siRNA non-targeting siRNA for gene x ρ_Spearman = 0.79- 0.89

  14. Effect of the siRNA or shRNA: Rc/Cc Effect of the drug: Cd/Cc Expected Combined Effect (ECE): Observed Combined Effect: Rc/Cc Rd/Cc Cd/Cc X SI= ECE - OCE > 0 = sensitive < 0 = resistant Range: from -1 to 1. Sensitivity Index • To identify genes that when targeted promote paclitaxel sensitivity or resistance, we calculated a sensitivity index (SI) score for each shRNA. The SI score accounts for the individual and combined effects of shRNAs and the effect of drug on cell viability. Rc: untreated RNA;Cc: untreated control; Cd: drugged control; Rd: drugged RNA. Swanton et al Cancer Cell 11, 498-512 2007

  15. Selection of hits from shRNA screen • Each gene has 2-11 shRNAs/clones. • A bootstrap algorithm was used to estimate the variability of the mean SI level for genes with > 3 shRNAs by randomly sampling from all shRNAs of that gene with replacement. The corresponding 95% bootstrap C.I. was calculated for each of these genes. • The mean SI value was calculated for the genes with ≤ 3 shRNAs. Hits represented by these genes were selected with a more stringent cutoff.

  16. Plot of SI scores for all shRNAs

  17. Paclitaxel sensitivity index for indicated genes from shRNA screen

  18. Top sensitizing gene targets from siRNA screen and the corresponding chemical inhibitors

  19. Statistical approaches • Methods used to find the genes that are sensitive/resistant to the drug include fold-change t test, Z-factor and their variants Wilcoxon rank-sum Sensitivity Index (SI) • It is unclear whether: • Drug effect, RNAieffect, and the interaction effect are all considered, • Variation among replicates is taken into account in the estimation, • Decision error rates (false-positive and false-negative) are appropriately controlled.

  20. A Linear Model with an Interaction Term • Disadvantage of SI: it ignores the variation among replicates. • Assuming normal distribution, we can model cell viability (Y) for each siRNA with the explanatory variables treatment (x1, yes/no), RNAi (x2, yes/no), and their interaction term (x1x2): D1: Y = a + b1*x1 + b2*x2 + b3*x1x2 + err D2: Y = a’+ b’1*x1 + err • Using the goodness-of-fit test, a Chi-square statistic can be calculated, based on the difference between the deviance of the reduced model (with drug effect only) D2and the residual deviance of the fitted model D1with 2degrees of freedom.

  21. A simulation study • Number of true hits: Uniform{10, 11, …, 60} out of 900+ genes • The viability measurements of non-hits: N(µNH,σ2), with σ= c (0.2, 0.4, 0.6, 0.8). • The distribution of true hits with a shifted mean relative to the non-hits: N(µNH*C, σ2) C>1 for an antagonizing effect, C<1 for a sensitizing effect. • The parameter D was used to tune the strength of the treatment effect • Parameter K (>1) was defined such that non-silencing control wells have a distribution with mean µctl=µrna*K, where µrna = µNH/ µNH*D.

  22. Truth + – TP #Agreed FP #Claimed Pos– #Agreed + # claimed positives Test (SI) FN #True Hits – #Agreed TN N – TP – FP – FN – N (900+) # True hits (10~60) FPR & FNR FNR = FN/(TP+FN) = 1-sensitivity FPR = FP/(FP+TN) = 1-specificity

  23. Power analysis: weak drug effect (low concentration) low σ, low D, high C moderate σ, low D, high C high σ, low D, high C

  24. Power analysis: strong drug effect (high concentration) low σ, high D, high C moderate σ, high D, high C high σ, high D, high C WHY?

  25. In the Case of Skewed Data • Gamma distributionsGa(r,λ) used instead of Normal. • The shape (r) and scale (λ) parameters of gamma distributions were calculated by solving µ=rλ and σ2= rλ2. • The skewness value ( ) is taken to be (0.5, 1, 1.5, 2).

  26. unskewed skewed Skewness =1 Scale=0.2; shape=4 ; skewness =1 (for untreated non-hits)

  27. unskewed skewed Skewness=1.5 Scale=0.45; shape=1.78; skewness=1.5 (for untreated non-hits)

  28. unskewed skewed Skewness=2 Scale=0.8; shape=1; skewness=2 (for untreated non-hits)

  29. Summary points of skewed data • Ratio-based methods can be unstable: the SI method is affected by the skewness the most. • The t-test is affected more by the skewness when there is a small number of replicates. • LM is quite stable. • FC: very unstable. • Do transformation if the data are heavily skewed.

  30. Recommendations a Noise can be measured by coefficient of variation (CV) or variance-to-mean ratio (VMR). VMR<0.2: low noise, 0.2≤VMR<0.5: moderate noise; VMR≥0.5 high noise. b Drug effect can be estimated by Cd/Cc. cRNAieffect can be estimated by Rc/Cc.

  31. Increase the power of your study Possible ways to improve the power of your study: • Choose a proper statistical method that is most powerful for your study. • Reduce the variability. • Increase #replicates. • Use a relatively lower concentration of the drug if appropriate. “…Here we show that several of these targets sensitize lung cancer cells to paclitaxel concentrations 1,000-fold lower than otherwise required for a significant response, and we identify mechanistic relationships …” (Whitehurst et al., Nature 2007)

More Related