350 likes | 477 Views
Analysis of High-throughput Gene Expression Profiling. Why to Measure Gene Expression. 1. Determines which genes are induced/repressed in response to a developmental phase or to an environmental change. 2. Sets of genes whose expression rises and falls
E N D
Why to Measure Gene Expression 1. Determines which genes are induced/repressed in response to a developmental phase or to an environmental change. 2. Sets of genes whose expression rises and falls under the same condition are likely to have a related function. 3. Features such as a common regulatory motif can be detected within co-expressed genes. 4. A pattern of gene expression may be used as an indicator of abnormal cellular regulation. • A useful tool for cancer diagnosis
Why to Measure Gene Expression in Large Scale? Transitional vs. High-throughput Approaches
Techniques Used to Detect Gene Expression Level • Microarray (single or dual channel) • SAGE • EST/cDNA library • Northern Blots • Subtractive hybridisation • Differential hybridisation • Representational difference analysis (RDA) • DNA/RNA Fingerprinting (RAP-PCR) • Differential Display (DD-PCR) • aCGH: array CGH (DNA level) High-throughput
(DNA) Microarray 1. Developed around 1987. 2. Employ methods previously exploited in immunoassay context – specific binding and marking techniques. 3. Two types of probes: Format I: probe cDNA (500~5,000 bases long) is immobilized to a solid surface such as glass; widely considered as developed at Stanford University; Traditionally called DNA microarrays. Format II: an array of oligonucleotide (20~80-mer oligos) probes is synthesized either in situ(on-chip) or by conventional synthesis followed by on-chip immobilization; developed at Affymetrix, Inc. Many companies are anufacturing oligonucleotide based chips using alternative in-situ synthesis or depositioning technologies. Historically called DNA chips.
Microarray • Single Channel: sub-type classification • Dual Channel: differential expression gene screening • Tissue microarray • Protein microarray • ……
Array CGH • Detecting DNA copy variation via microarray approach • A hotspot in recent research works, especially in Cancer research
Microarray Analysis Which genes are up-regulated, down-regulated, co-regulated, not-regulated? gene discovery pattern discovery inferences about biological processes classification of biological processes
SAGE • Experimental technique assigned to gain a quantitive measure of gene expression. • ~10-20 base “tags” are produced (immediately adjacent to the 3’ end of the 3’ most NlaIII restriction site). • The SAGE technique measures not the expression level of a gene, but quantifies a "tag" which represents the transcription product of a gene.
SAGE Tags are isolated and concatermized. Relative expression levels can be compared between cells in different states.
The Algorithms and Challenges of High-throughput Gene Expression Analysis
Seeing is believing? No, need to correct errors.
SAGE: • A typical experiment requires ~30,000 gene expression comparisons where normal and a diseased cell is compared. • The results were subject to the size and reliabilities of the SAGE libraries. • Statistical measures are used to filter out candidate genes to reduce the dimensionality of the data but it is tedious and time consuming to play with these measures until a good set is found.
SAGE • TPM: a simple normalization method TPM=Count*1000,000/TotalCount • Bayesian approach http://cancerres.aacrjournals.org/cgi/content/full/59/21/5403
Microarray: Sources of errors • systematic • random logsignal intensity log RNA abundance
Sources of Errors (Cont.) • Printing and/or tip problems • Labeling and dye effects (differing amounts of RNA labeled between the 2 channels) • Differences in the power of the two lasers (or other scanner problems) • Difference in DNA concentration on arrays (plate effects) • Spatial biases in ratios across the surface of the microarray due to uneven hybridization • cDNA array cannot distinguish alternatively spliced forms
Errors that cannot be corrected by statistics • Competitive hybridization of different targets on the chip • Failure to distinguish different splicing forms • Misinterpretation of time course data when there are not sufficient points • Misinterpretation of relative intensity
Does clustered time course really mean co-expression? Picture taken from http://genomics.stanford.edu/yeast/additional_figures_link.html Yes, you can study known system (such as cell cycle) this way; but, how about the unknown systems?
Normalization by iterative linear regression then apply slope and intercept to the original dataset repeat until r2 changes by < 0.001 fit a line (y=mx+b) to the data set set aside outliers (residuals > 2 x s.e.) D Finkelstein et al. http://www.camda.duke.edu/CAMDA00/abstracts.asp
Normalization (Curvilinear) G Tseng et al., NAR 2001
After Normalization …… • Differentially Expressed (DE) Gene screeing • T-test • T-statistics • SVM • Clustering • Hierarchical • SOM • K-means • Network (Pathway) analysis • BioCarta, KEGG, GO databases • Bayesian network learning • Topology • …
Bioinformatics challenges 1. data management 2. utilizing data from multiple experiments 3. utilizing data from multiple groups * with different technologies * with only processed data available
Bioinformatics Analysis of Integrated Analysis of Gene Expression Profiling
Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression Daniel R. et al. PNAS, 2004(101), 9309-9314 T-test Q values (estimated false discovery rates) were calculated as where P is P value, n is the total number of genes, and i is the sorted rank of P value.
Cont. Meta-Profiling. The purpose of meta-profiling is to address the hypothesis that a selected set of differential expression signatures shares a significant intersection of genes (a meta-signature), thus inferring a biological relatedness.