360 likes | 458 Views
Different Plant Hormones Regulate Similar Processes through Largely Nonoverlapping Transcriptional Responses.
E N D
Different Plant Hormones RegulateSimilar Processes through LargelyNonoverlapping Transcriptional Responses Jennifer L. Nemhauser,1,3,4 Fangxin Hong,1,2,3 and Joanne Chory1,2,*1 Plant Biology Laboratory2Howard Hughes Medical InstituteThe Salk Institute for Biological Studies, La Jolla, CA 92037, USA3These authors contributed equally to this work.4Present Address: Department of Biology, University of Washington, Seattle, WA 98195, USA
How to integrate the diverse environmental signals Salt stess ABA (Achard,2003)
Comparison of global transcriptional responses to various hormone treatments • AtGenExpress: A project try to uncover the transcriptome of the model organism Arabidopsis thaliana • This project generated a gene expression profiling database using the Affymetrix gene chip to test RNA samples for Arabidopsis development, stress responses, light response, nutrient responses, chemical responses and genotype comparison.
ABA, gibberellic acid 3 (GA) indole-3-acetic acid (IAA;auxin), 1-amino-cyclopropane-1-carboxylic acid (ACC;ethylene precursor), zeatin (CK; cytokinin), brassinolide(BL; brassinosteroid) methyl jasmonate (MJ; jasmonate),
Step1:Within-Array Quality Control Checks • Scaling factors • Percent present • Average background • Hybridization controls • Internal controls
scaling factors The scaling factor provides a measure of the brightness of the array. Non-biological factors (amount and quality of the cRNA, amount of stain or other experimental variation) can contribute to the overall variability in hybridization intensities. Scaling is a mathematical technique to minimize differences in overall signal intensities between two or more arrays. In a particular set of experiment, the Scaling Factor value for all the arrays should be very close to each other (within three-fold of each other).
Percent present In the Microarray Suite 5.0 (MAS5), There is a normalization procedure, each probe set gives a Present, Marginal, or Absent call. A gene was called present if its specific hybridization intensity was signficantly above array background and noise levels. The percent of present probe sets out of all probe sets on the array is also used as a quality control measure. Although the percent of present probe sets measure is highly dependent on each specific experiment with respect to the number of genes you expect to be expressed, an extremely low value raises suspicion about the quality of an array. Also, it is expected that duplicate arrays have similar percent missing levels. For most expression chips, this percentage should be between 40 and 60. Low percent present may also indicate degradation or incomplete synthesis.
Average Background Average background is examined to determine if it is consistent across arrays. Affymetrix has indicated that typical background averages range from 20 to 100, but there is no statistically relevant range for these values to fall within.
Hybridization Controls There are four hybridization controls that are added to each sample to monitor the hybridization efficiency. These four controls represent genes that should not be present in mRNA. They are added to the sample before hybridization in controlled, escalating quantities.
Internal Controls (spiked control) There are two internal house-keeping genes ( -actin and GAPDH) that are used to evaluate the RNA and assay quality. Three probe sets have been designed per control. The first probe set measures the intensity of the 3’ end of the gene, the second probe set measures the intensity of the 5’ end of the gene, and the third probe set measures the intensity in the middle of the gene. The ratio of the intensity from the 3’ end to the 5’ end should theoretically be around 1. According to Wilson, et al. (2004), ratios above 1.25 for GAPDH should be considered outliers and ratios over 3 for -actin should be considered outliers.
Step 2: Data Preparation All data manipulations were performed in R (http://www.rproject.org/). 1.Normalization gcRMA package of BioConductor (Wu et al., 2004). Normalized two criteria: (1) detectable expression under at least one condition as defined by a MAS5 (Affymetrix) normalized expression larger than 100 units (2) variable expression across conditions (p value < 0.10 from ANOVA modeling). 2. quality test for each of 24 conditions studied (8 treatments, 3 time points each). Least square regression (R2) scatter plots 11603 genes out of 22,400 present on the microarray were used for further analysis.
Step 3. high-sensitive analysis 1. Identify genes whose expression was affected low-stringency/ high-sensitivity moderated linear model (p < 0.01; Smyth, 2004) BioConductor package Limma upregulated (higher signal undertreated conditions for at least one time point), Downregulated (lower signal under treated conditions for at least one time point), complex (upregulation at some time points and downregulation at others). 2. Gene annotation Using Gene Ontology and annotation of the A. thaliana genome from TAIR (implemented in R as annotation package ath1121501)
Chi-test to the signficance of overlap between two hormone treatment Results from high-sensitive analysis
Table 2. Overlapping Transcriptional Targets of Seven Hormones
Table 1. Known Hormone-Responsive Genes Identified by High-Sensitivity Analysis
Figure 2. Survey of Gene Ontology Categories for Each Hormone Table S8. Counts of selected Gene Ontology (GO) terms among genes identified as responsive to each hormone. Expected (E) counts and observed (O) counts are indicated. Over- (over) or under-representation (under) were assessed using the binomial exact test. Significant differences are shown in red (p<0.05). Note that if expected counts are less than 5, the test is not reliable.
Figure 3 Distribution of Genes Regulated by One or More Hormones
Stringent Analysis Using Robust differentially expressed genes were identified using two statistical methods: 1.Moderated linear model (Smyth, 2004) BioConductor packages Limma 2. rank product (RP) (Breitling et al., 2004) RankProd in R. A False Discovery Rate (FDR;Benjamini and Hochberg, 1995) multiple hypothesis testing correction was applied to each p value and significance was defined as FDR p value < 0.05.
Results from Stringent Analysis • Figure 5. Hormone-Specific Marker Genes Marker Genes criteria: (1)identified by both RP and Limma (FDR p value < 0.05) of a particular hormone, (2) not identified for any other hormone using either method (3) not identified using the criteria of our sensitive analysis (uncorrectedp value < 0.01).
Conclusion and discussion 1. A high-sensitivity analysis revealed a surprisingly low number of common target genes 2. Different hormones appear to regulate distinct members of protein families The paper conclude that there is not a core transcriptional growth-regulatory module in young Arabidopsis seedlings. ( Are the changes in the gene regulation more inclined to be retained in the gene duplication? ) • 1.The experiments are limited in 3 hours young Arabidopsis seedlings. • 2.I think the main reason for the low number of common target genes is the • Antagonism between different pathways. In a network, the more information • Integrated in the node, the more stabile of it.
Merits of this paper 1.This to examine the overlap in transcriptional effects among the various hormones in a global level. 2.identify robust marker genes specific for each hormone. These studies reveal that a major part of early hormone response in plants is specific and independent of the effects of other hormones. 3.They also used the mutant data to rule out the possibility of that some hormone responses may be near saturation in wide type plants.
gcrma package • The gcrma package is part of the Bioconductor1 project.gcrma adjusts for background intensities in Affymetrix array data which include • optical noise and non-specific binding (NSB). The main function gcrma converts background adjusted probe intensities to expression measures using the same normalization and summarization methods as rma (Robust Multiarray Average).
Scaling factor Scaling factor. The scaling factor provides a measure of the brightness of the array. The “brightness” (image intensity) varies from array to array. Non-biological factors (amount and quality of the cRNA, amount of stain or other experimental variation) can contribute to the overall variability in hybridization intensities. In order to reliably compare data from multiple arrays, it is essential that the intensity of the arrays be brought to the same level. Scaling is a mathematical technique used by the Microarray Suite Software (MAS) to minimize differences in overall signal intensities between two or more arrays thus allowing for more reliable detection of biologically relevant changes in the same sample. MAS calculates the overall intensity of an array by averaging the intensity values of every probe set on the array with the exception of the top and bottom 2% of the probe set intensities. The average intensity of the array is then multiplied by the Scaling factor to bring it to an arbitrary Target Intensity value (usually 1500) set by the user. Thus, scaling allows a number of experiments to become normalized to one Target Intensity, allowing comparison between any two experiments. In a particular set of experiment, the Scaling Factor value for all the arrays should be very close to each other (within three-fold of each other).
called present • Essentially, a gene was called present if its specific hybridization intensity was signficantly above array background and noise levels.
One-way ANOVA • Typical application Assumptions Data needed Testing for equality of the means of several univariate samplesNormal distribution and similar variances and sample sizesTwo or more columns of measured or counted dataOne-way ANOVA (analysis of variance) is a statistical procedure for testing the null hypothesis that several univariate samples (in columns) have the same mean. The samples are assumed to be close to normally distributed and have similar variances. If the sample sizes are equal, these two assumptions are not critical. • See Brown & Rothery (1993) or Davis (1986) for details. • Levene's test for homogeneity of variance (homoskedasticity), that is, whether variances are equal as assumed by ANOVA, is also given. If Levene's test is significant, meaning that you have unequal variances, you can use the unequal-variance (Welch) version of ANOVA, with the F, df and p values given. • If the ANOVA shows significant inequality of the means (small p), you can go on to study the given table of "post-hoc" pairwise comparisons, based on Tukey's HSD test. The Studentized Range Statistic Q is given in the lower left triangle of the array, and the probabilities p(equal) in the upper right. Sample sizes do not have to be equal for the version of Tukey's test used.
spiked control • Presence of spiked control cRNAs: Bio-B, C, D and CRE serve as a controls for hybridization and are spiked at the following concentrations: BioB: 1.5 pM, BioC: 5.0 pM, BioD: 25.0 pM, BioCRE: 100 pM. We specifically look at the average difference values which should be present in increasing amounts, B being the least and CRE the highest
3'/5' ratio of housekeeping genes: This is a measure of the efficiency of the cDNA synthesis reaction. Reverse transcriptase synthesizes cDNA starting from the 3'-end of an mRNA and ending at the 5'-end. All Affymetrix arrays contain probes for the regions corresponding to 3', middle and 5'-end of the house keeping genes such as GAPDH and Actin. The ratio of signal intensity for 3' probes to that from 5' probes provides a measure of the number of cDNA synthesis reactions that went to completion (full length cDNA is synthesized). An ideal ratio would be 1 whereas a higher value indicates that many cDNAs were started but did not go to completion. The 3'/5' ratio for the housekeeping genes should be at most 3. If the ratio is above 3, some sensitivity of the assay may be lost.
partial degradation during cell lysis could cause a variable extent of bias in quantification of different transcripts. • In the software of the commonly used system (Bioanalyzer 2100, Agilent), quantification of 18S and 28S rRNA is compromised by the fact that this calculation is based on area measurements that are heavily dependent on definitions of start and end points of peaks (Fig. 1a).
Question addressed whether different hormones converge on a common set of transcriptional targets in Arabidopsis seedlings. Genetic screens for hormone insensitivity have provided evidence for the nonredundant roles of hormones. For example, despite the apparent overlap in the growthpromotion activities of GA, auxin, and BRs, loss in an individual pathway results in severe dwarfism.
Potential Problems with the t-Tests When There Are Few Degrees of Freedom per Gene • Variance estimates based on few degrees of freedom can be unreliable. • This can be particularly problematic if our model for the data is not quite right. • Variances that are severely underestimated can lead to false positives while variances that are severely overestimated can lead to a loss of power for detecting DE genes.
Suppose is a known type of distribution where that depends on an unknown vector of parameters Hierarchical Modeling/Empirical Bayes Methods that can be estimated from the data.
Evaluation of the Moderated t-Statistic • Smyth (2004) simulate data only according to the proposed hierarchical model with complete independence across all genes. • The performance of the moderated t-statistic was demonstrated to be superior to the simple two-sample t-statistic and other approaches with respect to ranking genes for differential expression. • The validity of the p-values computed from moderated t-statistics was not examined.