280 likes | 439 Views
Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms. Jan Komorowski and Astrid Lägreid. Joint work with. Torgeir R. Hvidsten, Herman Midelfart, Astrid Lægreid and Arne K. Sandvik. Selected Challenges in Gene-expression Analysis.
E N D
Learning rule-based models from gene expression time profiles annotated with Gene Ontology terms Jan Komorowski and Astrid Lägreid
Joint work with • Torgeir R. Hvidsten, Herman Midelfart, Astrid Lægreid and Arne K. Sandvik J. Komorowski and A. Lägreid
Selected Challenges in Gene-expression Analysis • Function similarity corresponds to expression similarity but: • Functionally corelated genes may be expression-wise dissimilar (e.g. anti-coregulated) • Genes usually have multiple function • Measurements may be approximate and contradictory • Can we obtain clusters of biologically related genes? • Can we build models that classify unknown genes to functional classes, that are human legible, and that handle approximate and often contradictory data? • How can we re-use biological knowledge? J. Komorowski and A. Lägreid
Data • Data material • Serum starved fibroblasts, 8,613 genes • Added serum to medium at time = 0 • Used starved fibroblasts as reference • Measured gene activity at various time points • 493 genes found to be differentially expressed • Results • 278 genes known (3 repeats) • 212 genes unknown, (uncharacterized) • 211 genes given hypothetical function with 88% quality J. Komorowski and A. Lägreid
0 1 4 8 24 quiescent non-proliferating proliferating Fibroblast - serum response samples for microarray analysis serum J. Komorowski and A. Lägreid
0 1 4 8 24 quiescent non-proliferating proliferating Processes re-entry cell cycle stress response protein synthesis organelle biogenesis transcription cell motility lipid synthesis J. Komorowski and A. Lägreid
0 1 4 8 24 quiescent non-proliferating proliferating Dynamic processes delayed immediate early late immediate early intermediate primary secondary tertiary J. Komorowski and A. Lägreid
0 1 4 8 24 quiescent non-proliferating Protein appears after the transcript primary secondary tertiary proliferating J. Komorowski and A. Lägreid
0 1 4 8 24 Protein dynamics are not always similar to transcript dynamics gene transcript protein J. Komorowski and A. Lägreid
Molecular mechanisms of transcriptional response serum = signal effectors = cellular response secondary transcription factors immediate early response factors intermediate/late response genes delayed immediate early response genes immediate early response genes J. Komorowski and A. Lägreid
The dynamics of cellular processes stress response cell motility cell adhesion DNA synthesis energy metabolism protein synthesis cell cycle regulation 1 4 8 24 DNA synthesis cell motility lipid synthesis cell proliferation, negative regulation quiescent non-proliferating proliferating J. Komorowski and A. Lägreid
Methodology 1. Mining functional classes from an ontology 2. Extracting features for learning 3. Inducing minimal decision rules using rough sets 0 - 4(Increasing) AND 6 - 10(Decreasing) AND 14 - 18(Constant) => GO(cell proliferation) ! 4. The function of unknown genes is predicted using the rules J. Komorowski and A. Lägreid
Gene Ontology J. Komorowski and A. Lägreid
Biological processes from GO Amino acid and derivative metabolism Protein targeting Energy pathways DNA metabolism Lipid metabolism Transport Ion hemostasis Intracellular traffic Organelle organization and biogenesis Cell death Cell motility Stress response Cell surface receptor linked signal transduction Oncogenesis Cell cycle Cell adhesion Intracellular signaling cascade Developmental processes Blood coagulation Circulation J. Komorowski and A. Lägreid
Hierchical Clustering of the Fibroblast Data It’s not a cluster! J. Komorowski and A. Lägreid
Gene Ontology vs. Clusters found by Iyer et al. J. Komorowski and A. Lägreid
Template-based feature synthesis 12 measurement points, 55 possible intervals of length >2 J. Komorowski and A. Lägreid
Examples of template definitions J. Komorowski and A. Lägreid
Rule example 1 J. Komorowski and A. Lägreid
Rule example 2 J. Komorowski and A. Lägreid
Classification using template-based rules IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF 0 - 4(Constant) AND 0 - 10(Increasing) THEN GO(prot. met. and mod.) OR … IF … THEN IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … IF … THEN … … +4 Votes are normalized and processes with vote fractions higher than a selection-threshold are chosen as predictions J. Komorowski and A. Lägreid
Cross validation estimates Iyer et al. A: Coverage: 84% Precision: 50% B: Coverage: 71% Precision: 60% C: Coverage: 39% Precision: 90% Coverage = TP/(TP+FN) Precision = TP/(TP+FP) J. Komorowski and A. Lägreid
Cross validation estimates Cho et al. Coverage: 58% Precision: 61% Coverage = TP/(TP+FN) Precision = TP/(TP+FP) J. Komorowski and A. Lägreid
Protein Metabolism and Modification A B C D E A – annotations B – false negatives C – false positives D – true positives E – pred. unknown gene J. Komorowski and A. Lägreid
Re-classification of the Known Genes J. Komorowski and A. Lägreid
Co-classifications for the Unknown Genes J. Komorowski and A. Lägreid
Conclusions • Our methodology • Incorporates background biological knowledge • Handles well the noise and incompleteness in the microarray data • Can be objectively evaluated • Predicts multiple functions per gene • Can reclassify known genes and provide possible new functions of the known genes • Can provide hypotheses about the function of unknown genes • Experimental work needs to be done to confirm our predictions J. Komorowski and A. Lägreid
Genomic ROSETTA:http://www.idi.ntnu.no/~aleks/rosetta J. Komorowski and A. Lägreid