240 likes | 328 Views
Microarray analysis: The CCBR’s perspective. Manjula Kasoji CCBR 09/29/2014. Common pitfalls. Number of replicates Source, quantity and quality of RNA Batch effects A dequate expression signal Time series experiments Non-target tissue contamination. No Replicates, No Statistics.
E N D
Microarray analysis: The CCBR’s perspective Manjula Kasoji CCBR 09/29/2014
Common pitfalls • Number of replicates • Source, quantity and quality of RNA • Batch effects • Adequate expression signal • Time series experiments • Non-target tissue contamination
No Replicates, No Statistics A project with no replicates may give you some information, but not possible to do statistics
How many biological replicates are needed in a quantitative study? >= 7replicates >= 4-5replicates • More replicates if • High biological variability • Contamination by non-target tissues • Subtle treatment effect • Multiple treatments • Mechanism of action • Network analysis • And many more… >= 3 replicates
Quality, quantity and Source of RNA influence sample clustering Depending on the source of RNA, sometimes even with the required number of replicates, samples do not cluster well • Embryonic tissue • Knock-out efficiency may also play a role in good sample clustering. Knockout 1 Knockout 2 Restoration of Knockout 1 Control
Randomization and consistent processing will help avoid batch effects B B A A Treatment Biological Replicate Technical Replicate Array Batch
Example of batch effect Cell line A Treated 1 Batch 1 – Scan Date 02/22/2011 Cell line B Control 1 Batch 2 – Scan Date 08/12/2011 Cell line C Treated 2 Cell line D Control 2 Cell line E
Batch effects can be visualized via clustering as well Batch A Batch B • Summary: • Batch effects can be avoided by good experimental design and randomization. • Batch effects can be visualized on a PCA plot and by clustering.
Weak signal expression across samples confounds analysis results Group1 Group2 Group3 Group4 • Poor clustering of samples • Genes regulated by gene A induced upon DNA damage • 4 different conditions
Weak signal leads toVery little or no significant, differentially expressed genes • What can we do in this situation? • Relax the statistical parameters (lower p-value and FC thresholds) • Caveat is that this will increase the number of false positives and will negatively influence downstream analysis. • Summary: • Sufficient number of replicates • Randomization • Validation
Adding time points to an experiment can be useful for finding biological relevance • Comparing immune system response in knockout mouse model to human model after treatment with endotoxin. • Only one time point in mouse: 24hrs 6 time points in human data (0,2,4,6,9,24 hours) WT-Mouse KO-Mouse Human 0 hr Human 2 hr Human 4 hr Human 6 hr Human 9 hr Human 24 hr
A successful project: Sufficient number of replicates and samples of a group cluster well Principal Component Analysis • Effect of cell density and drug treatment on cell survival and growth. • Two conditions and 4 samples per group.
Diagnosing outliers PCA plots are a good way to flag outliers
Diagnosing outliers: Quality control • arrayQualityMetrics() from R/Bioconductor • Metrics measured: • Between array comparison(Distance between arrays, PCA) • Array intensity distribution(Box plots, density plots) • Affymetrix specific plots on raw data (RLE – Relative Log Expression) • Affymetrix specific plots on raw data (NUSE – Normalize Unscaled Standard Error) • Individual array quality (MA plots) • Spatial distribution of intensities • If a sample outlier fails more than one QC metric: • that sample should be re-run if possible. • be removed from the analysis.
Diagnosing outliers: Quality control Density Plot Box Plot Heat map
Sufficient number of replicates and good quality lead to sufficient number of DEGs Significant, differentially expressed genes (DEGs), p-value 0.05, FC 2 • Summary: • Sufficient replicates and good quality samples yield a successful project. • Outliers can be diagnosed by visualization on a PCA plot and checking technical QC metrics to ensure that the outlier is not due to biological variability.
Downstream analysis: Functional enrichment using IPA • Question: Which genes are associated with the growth-suppressive effect of low cell density on cell proliferation and survival? • Time 1 = low cell density, Time 2 = high density Top 5 Bio-functions Time 1, Treated vs. Non-treated Time 2, Treated vs. Non-treated 78 210 224 Subset of the 10 genes specifically involved in the Cellular Growth and Proliferation function that are also predicted to be growth suppressive.
Visualization of networks in IPA Interaction network Interaction network expanded to include connections to upstream molecules
After the analysis • Submit data to public repository and provide required metadata
What you need to provide to CCBR • Give us a visit before you begin your experiment • Raw data (e.g. .CEL files) • Metadata (type of array, platform, species, experimental design information, processing dates) • http://ccrifx.cancer.gov/apps/site/example_microarray • Your goals and participation • Submit your project request • https://ccrifx.cancer.gov/apps/project_request/request_project 1 CCBR Investigator 3 2 Microarray Facility 4
If you want to perform the analysis on your own, you need to… • Learn appropriate qc methods, different statistical tests, and experimental designs • Know what is in your tool box • Command line • Affymetrix Power Tools (APT)—for Macs, command line only; free • R/Bioconductor packages • GUI tools • Affymetrix Expression Console (PC only)--free • Partek • Gene Set Enrichment Analysis (GSEA) • Ingenuity Pathway Analysis (IPA) • To take this further • Know how to run command line programs • Learn how to script (R/Bioconductor) • Learn different R packages
Recap • Appropriate experimental design • Sufficient replicates to have statistical power • Consistent processing to avoid batch effects • Raw data and meta data • Visualization • Validation * Continuous interaction with CCBR
Acknowledgements CCRIFX FathiElloumi, PhD ParthavJailwala, MS Li Jia, MS Manjula Kasoji, MS AnjanPurkayastha, PhD Anand S Merchant, MD, PhD Eric Stahlberg, PhD • CCR experts • Maggie Cam, PhD • Sean Davis, MD, PhD • Max Lee, PhD • Peter FitzGerald, PhD • David Goldstein, PhD • Sequencing Facility • Yongmei Zhao, MS • Bao Tran, MS ABCC Brian Luke, PhD Uma Mudunuri, MS Bob Stephens, PhD Ming Yi, PhD Jack Collins, PhD
Questions??ContactCCBR home page:http://ccrifx.cancer.gov/apps/site/defaultCCBR email: ccrifx_support@mail.nih.govBuilding 37, room 1123Building 41, room B620Office hours: Fridays 9:30am -11:30am