110 likes | 285 Views
Identifying Differentially Expressed Genes in Time Series Microarrays. Jonathan J. Smith 1 Hsun-Hsien Chang 2 Marco F. Ramoni 2 1 Department of Mathematics, MIT 2 Division of Health Sciences and Technology, Harvard-MIT New England Statistics Symposium April 17, 2010. Background.
E N D
Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith1 Hsun-Hsien Chang2 Marco F. Ramoni2 1Department of Mathematics, MIT 2Division of Health Sciences and Technology, Harvard-MIT New England Statistics Symposium April 17, 2010
Background • Microarray technologies enable profiling expression of thousands of genes in parallel on a single chip. • Comparative analysis of gene expression across tissue states extracts signature genes for disease diagnosis. • Identify differentially expressed genes across tissue states, using t-statistics, fold-change, signal-to-noise ratio, principal component analysis, etc. • Research trend: • Microarray technologies are cost down. • Collect times series gene expression microarrays to study biological functions.
Approach • Challenge: • Existing methods (t-statistics, fold-change, SNR, PCA) cannot be extended to longitudinal expression analysis because temporal information is not well represented. • Propose to use the framework of Bayesian networks to capture both the functional and temporal dependencies.
Bayesian Networks • Bayesian networks are directed acyclic graphs where: • Node corresponds to random variables. • Directed arcs encode conditional probabilities of the target nodes on the source nodes.
G Pheno G The gene is dependent on the phenotypes. Pheno G The gene is independent of the phenotypes. Representation of Functional Dependence . . . . Pheno Phenotypes are modeled by a binomial variable. Tissue state 1 Tissue state 2 Case 1 Case 2 Case M Expression of human subjects is modeled by a log-normal variable. Gene G
G(1) G(2) G(T) Representation of Temporal Dependence . . . . Tissue state 1 Tissue state 2 Case 1 Case 2 Case M G(1) G(2) G(T) The time series expression of gene G is considered a 1st order Markov chain.
G(1) G(2) G(T) G(1) G(2) G(T) G(1) G(2) G(T) Differentially Expressed Time Series The expression series is independent of the phenotypes. Pheno The expression series is dependent on the phenotypes. Pheno Phenotype variable modulates gene expression at every time point. Pheno
Identify Function-Dependent Genes p( | Data ) Bayes Factor = p( | Data ) p( Data | ) p( Data | )
Clinical Study on Breast Cancer • Breast cancer is the most prevalent cancer in women. Identification of genes inducing breast cancer will help drug development. • We used breast cancer microarray data from Gene Expression Omnibus (accession number GSE11352). • Our method identified 40 genes that may drive breast cancer development. • Biologists confirmed that these genes are involved in cell death, developmental disorder, and endocrine system disorder (all are prerequisites of breast cancer).
Conclusion • Develop a Bayesian network method for identification of genes in longitudinal expression microarray data. • Functional dependence: genes modulated by phenotypes. • Temporal dependence: gene expression time series modeled by 1st order Markov chain. • Use Bayes factor to select differentially expressed genes.