470 likes | 642 Views
Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible. Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. General microarry data analysis workflow
E N D
Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison CodeLink compatible
Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison General microarry data analysis workflow From raw data to biological significance Comparison statistics and correction for multiple testing GeneSifter Overview Gene Expression in Huntington's Disease Peripheral Blood Identification of biological themes Platform comparison
Analysis Workflow Raw data Normalized, scaled data Differentially expressed genes Identify and partition expression patterns Gene Summaries Biological themes (Pathways, molecular function, etc.)
Analysis Workflow Raw data Data upload Normalized, scaled data Comparison statistics, correction for multiple testing Differentially expressed genes Up and down regulated, magnitude, clustering Identify and partition expression patterns Annotation (UniGene, Entrez Gene, Gene Ontologies, etc.) Gene Summaries Ontology report, pathway report, z-score Biological themes (Pathways, molecular function, etc.)
microarraysuccess.com Experiment Design Experimental design determines what can be inferred from the data as well as determining the confidence that can be assigned to those inferences. Careful experimental design and the presence of biological replicates are essential to the successful use of microarrays. • Type of experiment • Two groups • Three or more groups • Time series • Dose response • Multiple treatment The type of experiment and number of groups will affect the statistical methods used to detect differential expression • Replicates • The more the better, but at least 3 • Biological better than technical Rigorous statistical inferences cannot be made with a sample size of one. The more replicates, the stronger the inference. Supporting material - Experimental Design and Other Issues in Microarray Studies - Kathleen Kerr -http://ra.microslu.washington.edu/presentation/documents/KerrNAS.pdf
microarraysuccess.com Differential Expression The fundamental goal of microarray experiments is to identify genes that are differentially expressed in the conditions being studied. Comparison statistics can be used to help identify differentially expressed genes and cluster analysis can be used to identify patterns of gene expression and to segregate a subset of genes based on these patterns. • Statistical Significance • Fold change Fold change does not address the reproducibility of the observed difference and cannot be used to determine the statistical significance. • Comparison statistics • 2 group • t-test, Welch’s t-test, Wilcoxon Rank Sum, • 3 or more groups • ANOVA, Kruskal-Wallis Comparison tests require replicates and use the variability within the replicates to assign a confidence level as to whether the gene is differentially expressed. Supporting material - Draghici S. (2002) Statistical intelligence: effective analysis of high-density microarray data. Drug Discov Today, 7(11 Suppl).: S55-63.
microarraysuccess.com Differential Expression • Correction for multiple testing- Methods for adjusting the p-value from a comparison test based on the number of tests performed. These adjustments help to reduce the number of false positives in an experiment. • FWER : Family Wise Error Rate (FWER) corrections adjust the p-value so that it reflects the chance of at least 1 false positive being found in the list. • Bonferonni, Holm, W & Y MaxT • FDR : False Discovery Rate corrections (FDR) adjust the p-value so that it reflects the frequency of false positives in the list. • Benjamini and Hochberg, SAM The FWER is more conservative, but the FDR is usually acceptable for “discovery” experiments, i.e. where a small number of false positives is acceptable Dudoit, S., et al. (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 18(1): 71-103. Reiner, A., et al. (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19(3):368-375.
GeneSifter – Microarray Data Analysis Accessibility Web-based Secure Data management Data Annotation (MIAME) Multiple upload tools CodeLink Affymetrix Illumina Agilent Custom Differential Expression - Powerful, accessible tools for determining Statistical Significance R based statistics Bioconductor Comparison Tests t-test, Welch’s t-test, Wilcoxon Rank sum test, ANOVA, Correction for Multiple Testing Bonferroni, Holm, Westfall and Young maxT, Benjamini and Hochberg Unsupervised Clustering PAM, CLARA, Hierarchical clustering Silhouettes CodeLink compatible
GeneSifter – Microarray Data Analysis Integrated tools for determining Biological Significance One Click Gene Summary™ Ontology Report Pathway Report Search by ontology terms Search by KEGG terms or Chromosome
The GeneSifter Data Center • Free resource • Training • Research • Publishing • 5 areas • Cardiovascular • Cancer • Neuroscience • Immunology • Oral Biology • Access to : • Data • Analysis summary • Tutorials • WebEx
The GeneSifter Data Center www.genesifter.net/dc
GeneSifter - Analysis Examples Data Upload CodeLink 2 groups (Huntingtons Blood vs Healthy Blood) 3 + groups (Time series, dose response, etc.) Differential expression Fold change Quality ANOVA False discovery rate Differential expression Fold change Quality t-test False discovery rate Visualization Hierarchical clustering PCA Partitioning PAM Silhouettes Biological significance Gene Annotation Ontology report Pathway report
Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison General microarry data analysis workflow From raw data to biological significance Comparison statistics and correction for multiple testing GeneSifter Overview Gene Expression in Huntington's Disease Peripheral Blood Identification of biological themes Platform comparison
Background - Huntington’s Disease • Huntington’s Disease (HD) • Autosomal dominant neurodegenerative disease • Motor impairment • Cognitive decline • Various psychiatric symptoms • Onset 30-50 years • Mutant Huntingtin protein (polyglutamine) • Effects transcriptional regulation • Transcription effects may occur outside of CNS
Pairwise Analysis Human blood expression for Huntington’s disease versus control, CodeLink CodeLink Human 20K Bioarray Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD, Hersch SM, Hogarth P, Bouzou B, Jensen RV, Krainc D. Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease. Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8.
Background - Data • Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease • Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD, Hersch SM, Hogarth P, Bouzou B, Jensen RV, Krainc D. • Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8. • Collected peripheral blood samples - • 14 Controls • 12 Symptomatic HD patients • 5 Presymptomatic HD patients • Identified 322 most differentially expressed genes (Con. Vs Symptomatic HD) using U133A array. • Used CodeLink 20K to confirm genes identifed using Affymetrix platform • Focused on 12 genes that showed most significant difference between Control and HD • Data available from GEO
Pairwise Analysis Human blood expression for Huntington’s disease versus control, CodeLink CodeLink Human 20K Bioarray Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD, Hersch SM, Hogarth P, Bouzou B, Jensen RV, Krainc D. Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease. Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8.
Pairwise Analysis Select group 1 14 normal Select group 2 12 Huntingtons
Pairwise Analysis Already normalized (median) t-test Quality filter – 0.75 (filters out genes with signal less than 0.75) Benjamini and Hochberg (FDR) Log transform data
Biological Significance Gene Annotation Sources • UniGene - organizes GenBank sequences into a non-redundant set of gene-oriented clusters. Gene titles are assigned to the clusters and these titles are commonly used by researchers to refer to that particular gene. • LocusLink (Entrez Gene) - provides a single query interface to curated sequence and descriptive information, including function, about genes. • Gene Ontologies – The Gene Ontology™ Consortium provides controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products. • KEGG - Kyoto Encyclopedia of Genes and Genomes provides information about both regulatory and metabolic pathways for genes. • Reference Sequences- The NCBI Reference Sequence project (RefSeq) provides reference sequences for both the mRNA and protein products of included genes. GeneSifter maintains its own copies of these databases and updates them automatically.
Ontology Report : z-score • R = total number of genes meeting selection criteria • N= total number of genes measured • r= number of genes meeting selection criteria with the specified GO term • n= total number of genes measured with the specific GO term Reference: Scott W Doniger, Nathan Salomonis, Kam D Dahlquist, Karen Vranizan, Steven C Lawlor and Bruce R Conklin; MAPPFinder: usig Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data, Genome Biology 2003, 4:R7
Pairwise Analysis - Summary Human blood expression for Huntington’s disease versus control, CodeLink 12 HD 14 Control t-test, Benjamini and Hochberg (FDR) Pattern selection Z-scores Biological processes Protein biosynthesis (104) Ubiquitin cycle (123) RNA splicing (53) KEGG Oxidataive phosphorylation (35) Apoptosis (22) 2606 increased In HD ~20,000 genes 5684 genes Biological processes Neurogenesis (90) Cell adhesion (120) Sodium ion transport (29) G-protein coupled receptor signaling (114) KEGG Neuroactive ligand-receptor interaction (56) 3078 decreased In HD
Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison General microarry data analysis workflow From raw data to biological significance Comparison statistics and correction for multiple testing GeneSifter Overview Gene Expression in Huntington's Disease Peripheral Blood Identification of biological themes Platform comparison
Pairwise Analysis Human blood expression for Huntington’s disease versus control, Affymetrix U133A Human Genome Array MAS 5 signal Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD, Hersch SM, Hogarth P, Bouzou B, Jensen RV, Krainc D. Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease. Proc Natl Acad Sci U S A. 2005 Aug 2;102(31):11023-8.
Pairwise Analysis - Affymetrix Already normalized (median) t-test Quality filter – 50 (filters out genes with signal less than 50) Benjamini and Hochberg (FDR) Log transform data
Pairwise Analysis – Gene List Human blood expression for Huntington’s disease versus control, Affymetrix
Platform comparison – Biological themes Affymetrix
GeneSifter - Analysis Examples Data Upload CodeLink 2 groups (Huntingtons Blood vs Healthy Blood) 3 + groups (Time series, dose response, etc.) Differential expression Fold change Quality ANOVA False discovery rate Differential expression Fold change Quality t-test False discovery rate Visualization Hierarchical clustering PCA Partitioning PAM Silhouettes Biological significance Gene Annotation Ontology report Pathway report
Cluster by Samples – All Genes Affymetrix CodeLink
Cluster by Samples – ? Affymetrix CodeLink
Cluster by Samples – Y Chrom. Genes Affymetrix CodeLink
Platform Comparison - Summary CodeLinkAffymetrix Transcripts Total 19729 22283 Increased in HD 2606 1976 Overlap (LL genes) 41% 65% Top BP Ontologies Ubiquitin cycle RNA splicing Regulation of translation Apoptosis Clustering of samples
Platform Comparison - Summary CodeLinkAffymetrix Increased in HD 2606 1976 Decreased in HD 3708 986 Unique ontology Oxidative Phos. IL-6 Biosynthesis
MicroarraySuccess.com Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment Two groups Time series Dose Response Multiple treatments Replicates The more the better Technical vs. biological Platforms cDNA Oligo One color Two color Feature Extraction Software File formats Databases Raw Data Storing Retrieving Experiment Annotation Samples Protocols Usability Intuitive Special training System Access Single user desktop Single user server Web-based Sharing data In the lab Collaboration Normalization Differential Expression Fold change Comparison statistics FWER/FDR Pattern Identification Clustering Visualization Partitioning Gene Annotation UniGene LocusLink Gene Ontology KEGG OMIM Single Genes Gene Summaries Gene Lists Ontology Report Pathway Report MIAME What is it? Publication Public databases GEO ArrayExpress SMD Using public data Meta analysis Academic partner – University of Washington
The GeneSifter Data Center www.genesifter.net/dc
Thank You CodeLink compatible www.genesifter.net Trial account, tutorials, sample data and Data Center Eric Olson eric@genesifter.net 206.283.4363