1 / 24

Microarray analysis: The CCBR’s perspective

Microarray analysis: The CCBR’s perspective. Manjula Kasoji CCBR 09/29/2014. Common pitfalls. Number of replicates Source, quantity and quality of RNA Batch effects A dequate expression signal Time series experiments Non-target tissue contamination. No Replicates, No Statistics.

Download Presentation

Microarray analysis: The CCBR’s perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarray analysis: The CCBR’s perspective Manjula Kasoji CCBR 09/29/2014

  2. Common pitfalls • Number of replicates • Source, quantity and quality of RNA • Batch effects • Adequate expression signal • Time series experiments • Non-target tissue contamination

  3. No Replicates, No Statistics A project with no replicates may give you some information, but not possible to do statistics

  4. How many biological replicates are needed in a quantitative study? >= 7replicates >= 4-5replicates • More replicates if • High biological variability • Contamination by non-target tissues • Subtle treatment effect • Multiple treatments • Mechanism of action • Network analysis • And many more… >= 3 replicates

  5. Quality, quantity and Source of RNA influence sample clustering Depending on the source of RNA, sometimes even with the required number of replicates, samples do not cluster well • Embryonic tissue • Knock-out efficiency may also play a role in good sample clustering. Knockout 1 Knockout 2 Restoration of Knockout 1 Control

  6. Randomization and consistent processing will help avoid batch effects B B A A Treatment Biological Replicate Technical Replicate Array Batch

  7. Example of batch effect Cell line A Treated 1 Batch 1 – Scan Date 02/22/2011 Cell line B Control 1 Batch 2 – Scan Date 08/12/2011 Cell line C Treated 2 Cell line D Control 2 Cell line E

  8. Batch effects can be visualized via clustering as well Batch A Batch B • Summary: • Batch effects can be avoided by good experimental design and randomization. • Batch effects can be visualized on a PCA plot and by clustering.

  9. Weak signal expression across samples confounds analysis results Group1 Group2 Group3 Group4 • Poor clustering of samples • Genes regulated by gene A induced upon DNA damage • 4 different conditions

  10. Weak signal leads toVery little or no significant, differentially expressed genes • What can we do in this situation? • Relax the statistical parameters (lower p-value and FC thresholds) • Caveat is that this will increase the number of false positives and will negatively influence downstream analysis. • Summary: • Sufficient number of replicates • Randomization • Validation

  11. Adding time points to an experiment can be useful for finding biological relevance • Comparing immune system response in knockout mouse model to human model after treatment with endotoxin. • Only one time point in mouse: 24hrs 6 time points in human data (0,2,4,6,9,24 hours) WT-Mouse KO-Mouse Human 0 hr Human 2 hr Human 4 hr Human 6 hr Human 9 hr Human 24 hr

  12. A successful project: Sufficient number of replicates and samples of a group cluster well Principal Component Analysis • Effect of cell density and drug treatment on cell survival and growth. • Two conditions and 4 samples per group.

  13. Diagnosing outliers PCA plots are a good way to flag outliers

  14. Diagnosing outliers: Quality control • arrayQualityMetrics() from R/Bioconductor • Metrics measured: • Between array comparison(Distance between arrays, PCA) • Array intensity distribution(Box plots, density plots) • Affymetrix specific plots on raw data (RLE – Relative Log Expression) • Affymetrix specific plots on raw data (NUSE – Normalize Unscaled Standard Error) • Individual array quality (MA plots) • Spatial distribution of intensities • If a sample outlier fails more than one QC metric: • that sample should be re-run if possible. • be removed from the analysis.

  15. Diagnosing outliers: Quality control Density Plot Box Plot Heat map

  16. Sufficient number of replicates and good quality lead to sufficient number of DEGs Significant, differentially expressed genes (DEGs), p-value 0.05, FC 2 • Summary: • Sufficient replicates and good quality samples yield a successful project. • Outliers can be diagnosed by visualization on a PCA plot and checking technical QC metrics to ensure that the outlier is not due to biological variability.

  17. Downstream analysis: Functional enrichment using IPA • Question: Which genes are associated with the growth-suppressive effect of low cell density on cell proliferation and survival? • Time 1 = low cell density, Time 2 = high density Top 5 Bio-functions Time 1, Treated vs. Non-treated Time 2, Treated vs. Non-treated 78 210 224 Subset of the 10 genes specifically involved in the Cellular Growth and Proliferation function that are also predicted to be growth suppressive.

  18. Visualization of networks in IPA Interaction network Interaction network expanded to include connections to upstream molecules

  19. After the analysis • Submit data to public repository and provide required metadata

  20. What you need to provide to CCBR • Give us a visit before you begin your experiment • Raw data (e.g. .CEL files) • Metadata (type of array, platform, species, experimental design information, processing dates) • http://ccrifx.cancer.gov/apps/site/example_microarray • Your goals and participation • Submit your project request • https://ccrifx.cancer.gov/apps/project_request/request_project 1 CCBR Investigator 3 2 Microarray Facility 4

  21. If you want to perform the analysis on your own, you need to… • Learn appropriate qc methods, different statistical tests, and experimental designs • Know what is in your tool box • Command line • Affymetrix Power Tools (APT)—for Macs, command line only; free • R/Bioconductor packages • GUI tools • Affymetrix Expression Console (PC only)--free • Partek • Gene Set Enrichment Analysis (GSEA) • Ingenuity Pathway Analysis (IPA) • To take this further • Know how to run command line programs • Learn how to script (R/Bioconductor) • Learn different R packages

  22. Recap • Appropriate experimental design • Sufficient replicates to have statistical power • Consistent processing to avoid batch effects • Raw data and meta data • Visualization • Validation * Continuous interaction with CCBR

  23. Acknowledgements CCRIFX FathiElloumi, PhD ParthavJailwala, MS Li Jia, MS Manjula Kasoji, MS AnjanPurkayastha, PhD Anand S Merchant, MD, PhD Eric Stahlberg, PhD • CCR experts • Maggie Cam, PhD • Sean Davis, MD, PhD • Max Lee, PhD • Peter FitzGerald, PhD • David Goldstein, PhD • Sequencing Facility • Yongmei Zhao, MS • Bao Tran, MS ABCC Brian Luke, PhD Uma Mudunuri, MS Bob Stephens, PhD Ming Yi, PhD Jack Collins, PhD

  24. Questions??ContactCCBR home page:http://ccrifx.cancer.gov/apps/site/defaultCCBR email: ccrifx_support@mail.nih.govBuilding 37, room 1123Building 41, room B620Office hours: Fridays 9:30am -11:30am

More Related