670 likes | 823 Views
Study of the Transcriptome of the Prematurely Aging Yeast Mutant dna2-1 Using a New Method Allowing Comparative DNA Microarray Analysis. Isabelle Lesur Thesis defense – 04/25/05. 1/48. Introduction. aging. Young cells S. cerevisiae. old cells S. cerevisiae. Transcriptome study.
E N D
Study of the Transcriptome of the Prematurely Aging Yeast Mutant dna2-1 Using a New Method Allowing Comparative DNA Microarray Analysis Isabelle Lesur Thesis defense – 04/25/05
1/48 Introduction aging Young cells S. cerevisiae old cells S. cerevisiae Transcriptome study Microarray data Development of a large-scale automated comparison method
2/48 Content I - Experimental study of the causes of aging in S. cerevisiae * Aging in yeast * The dna2-1 prematurely aging mutant * Experimental approach * Experimental Results II – Comparative analysis of DNA microarray experiments * Motivation * A weighted-ontology for microarray experiments * Validation * Processing of the compared datasets III – Conclusion and perspectives
3/48 Aging in yeast Asymmetric division g1 g1 g1 g2 Size increases with aging g1 g3 g1 g4 death gn
4/48 The dna2-1 prematurely aging mutant 100 80 60 40 20 0 Median life spans (generations) dna2-1 8.1 +/- 2.5 DNA2+ 30.4 +/-5.4 Percent survival 1 11 21 31 41 generations (Hoopes L.L.M. and J. L. Campbell. 2002 . MBC)
5/48 Experimental procedure Yeast A Yeast B Data extraction Total RNA extraction RT-PCR of the mRNA ORFs deposit Hybridization Reading
6/48 Experimental approach 1- Control (young cells vs. young cells) After Lowest Regression Normalization Wild-type strain + M = log2(G/R) 8 10 A = log2(sqrt(R*G)) 94.39% genes within two folds variation; M є [-1,1] After Lowest Regression Normalization M = log2(G/R) dna2-1 mutant strain + A = log2(sqrt(R*G)) 94.36% genes within two folds variation; M є [-1,1] Selection of genes with variation in expression level |M| > 1
7/48 2- Wild-type experiments (young cells vs. old cells) After Lowest Regression Normalization + M = log2(G/R) A = log2(sqrt(R*G)) After Lowest Regression Normalization M = log2(G/R) + A = log2(sqrt(R*G)) 627 genes upregulated in old wild-type cells (10.22% of the genome) 387 genes downregulated in old wild-type cells (6.30% of the genome)
8/48 3- dna2-1 experiments (young cells vs. old cells) After Lowest Regression Normalization + M = log2(G/R) A = log2(sqrt(R*G)) After Lowest Regression Normalization + M = log2(G/R) A = log2(sqrt(R*G)) 898 genes upregulated in old dna2-1 cells (14.63% of the genome) 656 genes downregulated in old dna2-1 cells (10.69% of the genome)
9/48 Content I - Experimental study of the causes of aging in S. cerevisiae * Aging in yeast * The dna2-1 prematurely aging mutant * Experimental approach *Experimental Results II – Comparative analysis of DNA microarray experiments * Motivation * A weighted-ontology for microarray experiments * Validation * Processing of the compared datasets III – Conclusion and perspectives
WT dna2-1 10/48 Glucose metabolism and energy production 9 8 7 6 5 4 3 2 -1 1 -2 -3 -4 -5 Ratio of expression level old cells / young cells Gene name Lipid metabolism TCA cycle Oxidative phosphorylation Mig1-repressed genes Glycogen production Glyoxylate cycle Gluconeogenesis Shift from glycolysis toward gluconeogenesis and energy production associated with aging Compared with Lin S.S., J. K. Manchester, and J. I. Gordon. 2001. J. Biol. Chem.
11/48 The Environmental Stress Response (ESR) dna2-1 1553 genes • Compared with Gasch A. P. , P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, and P. O. Brown. 2000. Mol. Biol. Cell.
11/48 The Environmental Stress Response (ESR) 1553 genes 868 genes dna2-1 1147 genes ESR 462 genes 406 • Compared with Gasch A. P. , P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, and P. O. Brown. 2000. Mol. Biol. Cell.
Wild-type 522 genes 1014 genes 11/48 The Environmental Stress Response (ESR) 1553 genes 868 genes ESR 352 genes dna2-1 850 genes 321 85 110 297 Old cells react as if they were growing under external stress conditions • Compared with Gasch A. P. , P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, and P. O. Brown. 2000. Mol. Biol. Cell.
12/48 The Telomerase Delete Response (TDR) genes common to tlc1 and old dna2-1 cells 652 genes 1553 genes 340 genes dna2-1 1213 genes tlc1Δ 312 genes Similarity between expression profiles associated with aging of dna2-1 and the TDR Compared with: Nautiyal S., J. L. DeRisi, and E. H. Blackburn. 2002. PNAS. USA Teng S. C., C. Epstein, Y. L. Tsai, H. W. Cheng, H. L. Chen, and J. J. Lin. 2002. Biochem Biophys Res Commun.
WT dna2-1 13/48 Genes belonging to the TDR and upregulated in old dna2-1 cells 7 6 5 4 3 2 -1 1 -2 -3 Ratio of expression level old cells / young cells Others ESR Gene name DNA synthesis Telomere deletion “signature” genes Unknown DNA-damage signature genes Carbohydrate metabolism Oxydative phosphorylation Compared with: Nautiyal S., J. L. DeRisi, and E. H. Blackburn. 2002. PNAS. USA Teng S. C., C. Epstein, Y. L. Tsai, H. W. Cheng, H. L. Chen, and J. J. Lin. 2002. Biochem Biophys Res Commun.
WT dna2-1 14/48 Genes belonging to the TDR and downregulated in old dna2-1 cells 5 4 3 2 -1 1 -2 -3 -4 -5 Ratio of expression level old cells / young cells Gene name Ribosomal genes Histones Compared with: Nautiyal S., J. L. DeRisi, and E. H. Blackburn. 2002. PNAS. USA Teng S. C., C. Epstein, Y. L. Tsai, H. W. Cheng, H. L. Chen, and J. J. Lin. 2002. Biochem Biophys Res Commun.
WT dna2-1 15/48 DNA damage repair 5 4 3 2 -1 1 -2 -3 Ratio of expression level old cells / young cells Gene name DNA-damage “signature” genes DNA-damage Checkpoint pathway Post-replication repair pathway Recombinational repair pathway Mismatch DNA repair NER pathway Activation of numerous genes repairing DNA
16/48 Result summary Metabolic Response Energy ESR
16/48 Result summary Similarity to telomerase deletion mutant Metabolic Response Energy ESR TDR
16/48 Result summary Similarity to telomerase deletion mutant Metabolic Response Energy ESR TDR Activation of the RAD52 pathway (specific WT) DNA repair DNA-damage signature genes DNA-damage response
“causes” “is part of” A B A B Caloric Restriction DNA damaging agents HEO Lifespan increases Lesur, I. and J. L. Campbell. 2004. MBC 16/48 Result summary Similarity to telomerase deletion mutant Metabolic Response Energy ESR TDR Activation of the RAD52 pathway (specific WT) DNA repair DNA-damage signature genes DNA-damage response
17/48 Content I - Experimental study of the causes of aging in S. cerevisiae * Aging in yeast * The dna2-1 prematurely aging mutant * Experimental approach * Experimental Results II – Comparative analysis of DNA microarray experiments * Motivation * A weighted-ontology for microarray experiments * Validation * Processing of the compared datasets III – Conclusion and perspectives
18/48 The need of the biologist • Biological study: • characterization of aging in yeast • identification of a specific need of the biologist: large-scale automatic comparison of microarray experiments 3 steps required: • Establishment of criteria of comparability • Development of large-scale practical methods for comparison of real experiments • requires data in a handleable format (MIAME) • requires a tool to structure information (ontology) • Integration of these methods in a test platform (not discussed here)
Microarray experiments Comparison of microarray experiments Filtering Normalization Clustering Annotation Visualization Analysis Analysis Better interpretation of microarray data Interpretation of microarray data 19/48 Selection criteria in databanks Objective: reproduce the decision making process of the biologist
comparison Microarray Experiment 1 Microarray Experiment 2 quantitative qualitative Comparison of values Statistical conversions of numerical data Biological attributes Statistical attributes Biological attributes Statistical attributes 20/48 Formalization of the comparison process
21/48 The MIAME standard MinimumInformationAboutaMicroarrayExperiment Specification for the minimum amount of information one needs to fully describe a microarray experiment, interpret it and verify the results. Objective: guiding the development of microarray databases and data management software. Information are given by maximum use of controlled vocabularies. The use of controlled vocabularies is needed to enable database queries and automated data analysis. Brazma et al. 2001. Nature.
A Comparison or conversion l.u.b. of <E,J> Distance <E,J> B F H G E C D J I 22/48 An ontology for microarray experiments Use of a controlled vocabulary to describe microarray experiments. This controlled vocabulary is structured as a tree: vertices = classes edges = relation “is-a” between each pair of classes
23/48 MGED ontology for microarray experiments (Microarray Gene Expression Data) Microarray experiment Measurements: Images, quantification, specifications, … Experimental design: author, experimental dimensions, … Array design: array description, quality controls, … Samples: Extraction, preparation, labeling, … hybridization: Procedures, parameters, … Normalization: types, values algorithm http://www.mged.sourceforge.net/ontologies/MGEDontology.php
author dimensions hybridization measurements type of experiment sample Organism, Extraction, Isolation, Amplification, Labeling, Treatment, Nucleic acid extracted, … scanner multiple reversed … Dose-response, Time-course, Interaction detection … protocol 24/48 Our ontology for microarray experiments 69 classes Microarray experiment Biological attributes Statistical attributes Comparison of values
24/48 Our ontology for microarray experiments 69 classes Microarray experiment Biological attributes Statistical attributes filtering normalization Upper threshold, Lower threshold, Global normalization, Intensity-dependent normalization, Minimal cost conversion
25/48 Cost-model for datasets comparison Most arbitrary pairs of classes can not be reasonably compared. - Each edge between two classes is associated with an edge cost [0,1] cost 0 = 100% penalty cost 1 = 0% penalty • Each path between two classes of the ontology (i.e. two values of an attribute) is unique (we consider the shortest path) and associated with a local cost • The costs along a path are multiplicative: symmetry and transitivity: • cost (A B) = cost (B A) • cost (A C) = cost (A B) * cost (B C)
Relevant attribute Variable Ck Weight Wk Organism CO 0.3 Type of Experiment CT 0.2 Nucleic acid extracted CN 0.2 Isolation CI 0.1 Type of Label CL 0.1 Filtering Lower Threshold CFmin 0.05 Filtering Upper Threshold CFmax 0.05 ΣWk = 1 26/48 Relevant attribute 1- Only relevant attributes describing a microarray experiment are taken into consideration to decide whether or not to compared two experiments 2- Each relevant attribute does not have the same influence on the decision making process
Σk[T,L,N,I,O,Fmin,Fmax]Wk . Ck, if all Ck ≠0 0, otherwise. 27/48 Global comparison cost The global comparison cost C associated with the comparison of two microarray experiments is computed as follows: C = 0 ≤ C ≤ 1 C = 0 No comparison between the two experiments is allowed C= 1 A quantitative comparison between the two experiments is allowed 0 < C< 1 A qualitative comparison between the two experiments is allowed
28/48 4 steps in the comparison process - Biologist experiment - Ontology - Repository Algo 1: identification of comparable microarray experiments List of experiments from Repository comparable to the biologist’s experiment (sorted from the more comparable to the less comparable)
28/48 4 steps in the comparison process Biologist’s experiment, Ontology, Repository Algo 1 List of experiments from Repository comparable to the biologist’s experiment (sorted from the more comparable to the less comparable) One experiment comparable to the biologist’s experiment Algo 2: statistical conversion of two comparable datasets Filtering thresholds: Fmin, Fmax Normalization techniques: Nb, Nc
28/48 4 steps in the comparison process Biologist’s experiment, Ontology, Repository, Bank of orthologous genes Algo 1 List of experiments from Repository comparable to the biologist’s experiment (sorted from the more comparable to the less comparable) One experiment comparable to the biologist’s experiment Algo 2: Filtering thresholds: Fmin, Fmax Normalization techniques: Nb, Nc Algo 3: quantitative comparison of two datasets ecomb
28/48 4 steps in the comparison process Biologist’s experiment, Ontology, Repository, Bank of orthologous genes Algo 1 List of experiments from Repository comparable to the biologist’s experiment (sorted from the more comparable to the less comparable) One experiment comparable to the biologist’s experiment Algo 2 Filtering thresholds: Fmin, Fmax Normalization techniques: Nb, Nc Bank of orthologous genes Algo 4: qualitative comparison of two datasets Algo 3 equal ecomb
Biologist B Ontology O eb = {sb , Db} E = {e1 , e2 , …,en} en= {sn Dn} Dn= {d1n ,d2n ,…, dknn} Repository R Ec = {ec1 , ec2 , …, ecm} V ec in Ec, ec comparable to eb List {(dic , Ccb)} sorted by Ccb 29/48 Search for comparable experiments
30/48 Validation of the method Objective: assessing the validity of our criteria E = Lund et al. 2002 Murphy et al. 2003 Nautiyal et al. 2002 Teng et al. 2002 Gasch et al. 2000 Gasch et al. 2001 Spellman et al. 1998 Lin et al. 2001 Shepard et al. 2003 Ren et al. 2000 Eb = Lesur et al. 2004
31/48 Validation of the method Objective: assessing the validity of our criteria Global comparison cost E = Lund et al. 2002 0.775 Murphy et al. 2003 0.7 Nautiyal et al. 2002 1 Teng et al. 2002 1 Gasch et al. 2000 1 Gasch et al. 2001 1 Spellman et al. 1998 0.925 Lin et al. 2001 0.925 Shepard et al. 2003 0 Ren et al. 2000 0 Eb = Lesur et al. 2004
32/48 Validation of the method Objective: assessing the validity of our criteria Global comparison cost E = Lund et al. 2002 0.775 Murphy et al. 2003 0.7 Nautiyal et al. 2002 1 Teng et al. 2002 1 Gasch et al. 2000 1 Gasch et al. 2001 1 Spellman et al. 1998 0.925 Lin et al. 2001 0.925 Shepard et al. 2003 0 Ren et al. 2000 0 Eb = Lesur et al. 2004
33/48 Validation with a test coverage 7 relevant attributes classes of equivalence 288 representative biological feature sheets • Each pair of biological feature sheet is associated with two costs: • One cost is computed by our algorithm • One cost is manually estimated using the rules defined in our method 100% identity between the two costs Correct identification of the pairs of experiments not comparable
threshold threshold of comparability = 0.675 34/48 Distribution of the comparison costs non-comparable experiments comparable experiments 12800 cost ≠ 0
35/48 Validation on real data 2 public repositories used: - ArrayExpress (http://www.ebi.ac.uk/arrayexpress) - Stanford Microarray Database (SMD) (http://genome-www5.stanford.edu) ArrayExpress: 268 experiments (268 datasets) SMD: 21 experiments (70 datasets) Our repository content: 289 experiments (338 datasets)
36/48 Distribution of the comparison costs No comparison allowed Qualitative comparison 83521 comparison costs No ArrayExpress experiment usable Correct identification of experiments manually compared to our aging experiment Quantitative comparison
37/48 Comparison of our aging experiment with the experiments stored in our repository No comparison allowed Qualitative comparison Quantitative comparison - 11 experiments (65 datasets) identified as comparable with our experiment on aging - 6 experiments correctly combined with our experiment - 3 experiments correctly qualitatively compared with our experiment
38/48 Identification of new experiments comparable with our aging experiment • 2 newly identified experiments as being qualitatively comparable to our experiment 1. Trinklein et al. MBC. 2004. Study of the transcriptional response to heat shock in fibroblasts from wild-type mice and mice lacking the heat shock transcription factor 1 gene (HSF1). 2. Arbeitman et al. Science. 2002. Gene expression patterns during the life cycle of Drosophila melanogaster.
Biologist’s experiment Qualitatively compared experiment Quantitatively compared experiment ec2= {sc2,Dc2} eb = {sb,Db} ec1 = {sc1,Dc1} Statistical conversion (filtering on the same threshold, checking for normalization) Statistical conversion (filtering on the same threshold, checking for normalization) e’c2 = {sc2,D’c2} e’b = {sb,D’b} e’c1= {sc1,D’c1} e’’b = {sb,D’’b} combination union equal = {squal, Dqual} ecomb = {scomb, Dcomb} 39/48 Processing of the compared datasets
Dcomb = D1UD2 = {{g,r’g} : gєGD1U GD2, rg1Urg2, gєGD1 GD2 r’g =rg1U Ø, gєGD1 \ GD2 ØUrg2, gєGD2 \ GD1 U } 40/48 Combined dataset ecomb = {scomb , Dcomb} scomb = sc1Usb D1 = {{g,rg1} : gєGD1} D2 = {{g,rg2} : gєGD2}