1 / 16

Transcriptional Diagnosis by Bayesian Network

Transcriptional Diagnosis by Bayesian Network. Hsun-Hsien Chang and Marco F. Ramoni. Children’s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 17, 2009. Background.

hisano
Download Presentation

Transcriptional Diagnosis by Bayesian Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 17, 2009

  2. Background • Microarray technology enables profiling expression of thousands of genes in parallel on a single chip. • Comparative analysis of gene expression across tissue states extracts signature genes for disease diagnosis. • Challenge: • Number of variables (i.e., genes) is much greater than the number observations (i.e., biological samples), inducing the problem of overfitting. • Existing methods: • Gene selection: compute statistics (eg., t-statistics, SNR, PCA) of individual genes and select high rank genes. • Classification model: create a classification function of selected genes.

  3. Proposed Approach • Issues: • Assumption on gene independencies is inadequate. • Other genes may be collinearly expressed with the signature. • Selection and classification are two non-integrated steps. Need a cut-off threshold to select high rank genes. • Proposed strategies: • Adopt system biology approach to infer the functional dependence among genes. • Use the dependence network for tissue discrimination. • Integrate gene selection and classification model in Bayesian network framework.

  4. . . . . Tissue state 1 Tissue state 2 Pheno Case 1 Case 2 Case M Gene 1 G2 G1 Gene 2 . . . . . . . . . . . . Gene N GN Data Representation by Bayesian Network • Bayesian networks are directed acyclic graphs where: • Node corresponds to random variables. • Directed arcs encode conditional probabilities of the target nodes on the source nodes.

  5. Pheno G1 G1 G2 gene selection by Bayes factor G2 . . . . . . Gp Pheno Gq GN GN Gene Selection by Bayes Factor

  6. G1 G1 Gp GN collinearity elimination G2 G2 Gp Gp Pheno Pheno Gq Gq GN GN Collinearity Elimination via Network Learning

  7. G1 G2 Pheno Gp Gq GN Sample Classification • The phenotype variable is independent of the blue genes, given the green genes. • Technically, the green genes are under the Markov blanket of the phenotype variable, and they are the signature genes used for phenotype determination. • Tissue classification:

  8. . . . . . . . . . . . . . . . Optimize Performance Optimize Hyperparameters Algorithm Summary Gene Selection by Bayes Factor Collinearity Elimination Sample Classification (sensitivity analysis)

  9. Discriminate Lung Carcinoma Subtypes • Adenocarcinoma (AC) and squamous cell carcinoma (SCC) are major subtypes of lung cancer: • AC and SCC are distinct in survival, chances of metastasis, and responses to chemotherapy and targeted therapy. • Physicians lack confidence in correct recognition when there are multiple primary carcinomas. • Training: • 58 ACs and 53 SCCs. • 77 genes selected in the network. • 25 signature genes.

  10. Bayesian Network for Lung Carcinoma

  11. Large-Scale Testing on Independent Samples • 422 samples (232 ACs and 190 SCCs) aggregated from 7 cohorts (including Caucasians, African-Americans, Chinese). • Accuracy = 95.2% AUROC.

  12. Comparisons with Other Popular Methods • Higher classification accuracy. • Small-sized signature to avoid overfitting.

  13. KRT6 Family Characterizes the Lung Carcinoma Discrimination

  14. KRT6 Family Characterizes the Lung Carcinoma Discrimination • Keratin-6 family genes (KRT6A, KRT6B, KRT6C) are important for distinguishing lung cancer subtypes. • Accounting for 95% of the accuracy of the whole 25-gene signature. • Located on chromosome 12q12-q13. • A nonlinear, concave discriminative surface.

  15. Verification by Chr12q12-q13 Aberrations • Investigate DNA copy number changes in comparative genomic hybridization (CGH) array. • 12 ACs and 13 SCCs from Vrije University Medical Center, Netherland. • A dumbbell discriminative surface achieves 80% classification accuracy. • Treat average CGH values of genes occupying q12, q13, and q12-13 respectively as three features to construct a Naïve Bayes Classifier.

  16. Conclusion • Reverse engineer regulatory network information for tissue classification. • Adopt the system biology approach to infer gene dependencies network. • Select genes by Bayes factor. • Eliminate collinearity via network learning. • Integrate gene selection and classification model in a single Bayesian network framework. • Demonstrate the promising translational value of the system biology approach in clinical study.

More Related