360 likes | 410 Views
Explore microarray data analysis techniques such as normalization, clustering, and pattern analysis to interpret gene expression data accurately. Understand factors affecting gene expression changes and discover methods to identify differentially expressed genes. Learn about statistical tests, pattern finding, and clustering methods for efficient data interpretation. Enhance your knowledge in microarray analysis for comprehensive gene expression studies.
E N D
Microarray analysis Quantitation of Gene Expression Expression Data to Networks Reading: Ch 16 BIO520 Bioinformatics Jim Lund
Microarray data • Image quantitation. • Normalization • Find genes with significant expression differences • Annotation • Clustering, pattern analysis, network analysis
Sources of Non-Biological Variation • Dye bias: differences in heat and light sensitivity, efficiency of dye incorporation • Differences in the amount of labeled cDNA hybridized to each channel in a microarray experiment (Channel is used to refer to a combination of a dye and a slide.) • Variation across replicate slides • Variation across hybridization conditions • Variation in scanning conditions • Variation among technicians doing the lab work.
Factors which impact on the signal level • Amount of mRNA • Labeling efficiencies • Quality of the RNA • Laser/dye combination • Detection efficiency of photomultiplier or CCD
Hela HepG2
Hela HepG2
M vs. A Plot M =Log (Red -Log Green A = (Log Green+Log Red) / 2
Types of normalization • To total signal (linear normalization) • LOESS (LOcally WEighted polynomial regreSSion). • To “house keeping genes” • To genomic DNA spots (Research Genetics) or mixed cDNA’s • To internal spikes
Microarray analysis • Data exploration: expression of gene X? • Statistical analysis: which genes show large, reproducible changes? • Clustering: grouping genes by expression pattern. • Knowledge-based analysis: Are amine synthesis genes involved in this experiment?
Fold change: the crudest method of finding differentially expressed genes Hela HepG2 >2-fold expression change >2-fold expression change
Distribution of measurements for gene of interest Probability of a given Value of the ratio What do we mean by differentially expressed? • Statistically, our gene is different from the other genes. Distribution of average ratios for all genes Number of genes Log ratio
Probe Signal Sample A Sample B Finding differentially expressed genesWhat affects our certainty that a gene is up or down-regulated? • Number of sample points • Difference in means • Standard deviations of sample
Practical views on statistics • With appropriate biological replicates, it is possible to select statistically meaningful genes/patterns. • Sensitivity and selectivity are inversely related - e.g. increased selection of true positives WILL result in more false positive and less false negatives. • False negatives are lost opportunities, false positives cost $’s and waste time. • A typical set of experiments treated with conservative statistics typically results in more genes/pathways/patterns than one can sensibly follow - so use conservative statistics to protect against false positives when designing follow-on experiments.
Statistical Tests • Student’s t-test • Correct for multiple testing! (Holm-Bonferroni) • False discovery rate. • Significance Analysis of Microarrays (SAM) • http://www-stat.stanford.edu/~tibs/SAM/ • ANOVA • Principal components analysis • Special methods for periodic patterns in data.
Volcano plot: log(expr) vs p-value p-value Log(fold change)
Pattern finding • In many cases, the patterns of differential expression are the target (as opposed to specific genes) • Clustering or other approaches for pattern identification - find genes which behave similarly across all experiments or experiments which behave similarly across all genes • Classification - identify genes which best distinguish 2 or more classes. • The statistical reliability of the pattern or classifier is still an issue and similar considerations apply - e.g. cluster analysis of random noise will produce clusters which will be meaningless….
What is clustering? • Group similar objects together. • Genes with similar expression patterns. • Objects in the same cluster (group) are more similar to each other than objects in different clusters.
Clustering • What is clustering? • Similarity/distance metrics • Hierarchical clustering algorithms • Made popular by Stanford, ie. [Eisen et al. 1998] • K-means • Made popular by many groups, eg. [Tavazoie et al. 1999] • Self-organizing map (SOM) • Made popular by Whitehead, ie. [Tamayo et al. 1999]
Typical Tools • SAM (Significance Analysis of Microarrays), Stanford • GeneSpring • Affymetrix GeneChip Operating System (GCOS) • Cluster/Treeview • R statistics package microarray analysis libraries.
How to define similarity? Experiments X genes n 1 p 1 X • Similarity metric: • A measure of pairwise similarity or dissimilarity • Examples: • Correlation coefficient • Euclidean distance genes genes Y Y n n Raw matrix Similarity matrix
Similarity metrics • Euclidean distance • Correlation coefficient Euclidean clustering = magnitude & Direction Correlation clustering = direction
Self-organizing maps (SOM) [Kohonen 1995] • Basic idea: • map high dimensional data onto a 2D grid of nodes • Neighboring nodes are more similar than points far away
Things learned from from microarray gene expression experiments • Pathways not known to be involved • Ontology? • Novel genes involved in a known pathway • “like” and “unlike” tissues
Transcription FactorsRegulatory Networks • Identify co-regulated genes • Search for common motifs (transcription factor binding sites) • Evaluate known motifs/factors • Search for new ones. • Programs: MEME, etc.
mRNA-protein Correlation • YPD: should have relevant data • will yeast be typical? • Electrophoresis 18:533 • 23 proteins on 2D gels • r=0.48 for mRNA=protein • Post transcriptional and post translational regulation important!
Other microarray formats • Single nucleotide polymorphism (SNP) chips • Oligos with each of 4 nt at each SNP. • Chromosomal IP chips (ChIP:chip) • Determine transcription factor binding sites • Promoter DNA on the chip. • Alternative splicing chips • Long oligos, covering alternatively spliced exons, or all exons. • Genome tiling chips
ChIP:chip--Identification of Transcription Factor Binding Sites • Cross link transcription factors to DNA with formaldehyde • Pull out transcription factor of interest via immunoprecipitation with an antibody or by tagging the factor of interest with an isolatable epitope (e.g GST fusion). • Fractionate the DNA associated with the transcription factor, reverse the cross links, label and hybridize to an array of protomer DNA. • Brown et.al. (2001) Nature, 409(533-8)
On to Proteomics DNARNA Protein