340 likes | 508 Views
Yoo-Ah Kim NIH / NLM / NCBI. Identifying Causal Genes and Dysregulated Pathways in Complex Diseases. Nov. 6 th , 2010. Complex Diseases. Associated with the effects of multiple genes As opposed to single gene diseases
E N D
Yoo-Ah Kim NIH / NLM / NCBI Identifying Causal Genes and Dysregulated Pathways in Complex Diseases Nov. 6th, 2010
Complex Diseases • Associated with the effects of multiple genes • As opposed to single gene diseases • The combination of genomic alteration may vary strongly among different patients • Dysregulating the same components, thus often leading to the same disease phenotype • Difficult to study and Treat • Cancer, Heart diseases, Diabetes, etc.
Copy Number Variations • Two copies of each gene are generally assumed to be present in a genome • Genomic regions may be deleted or duplicated causing CNV • Some CNVs are associated with susceptibility or resistance to diseases such as cancer Copy Number Variations in 158 Glioblastoma patients
Identifying Genomic Causes in Complex Diseases • Identify genotypic causes in individual patients as well as dysregulated pathways • Systems biology approach • Genome-wide search • Graph theoretic algorithms • Circuit flow • Set cover • 158 Glioblastomamultiforme patients
Glioblastomamultiforme (GBM) • the most common and most aggressive type of primary brain tumor in humans
Expression as Quantitative Trait Genotype: Copy number variations Phenotype: Gene expression
eQTL (expression Quantitative Trait Loci) Analysis • While we assume that the genetic variation is the cause and expression change is the effect, we don’t know molecular pathways behind the relation Putative causal gene/loci Putative target gene
A B cases cases Method Outline g1 s1 s2 g2 tag loci g3 target genes s3 gm s4 • Target gene selection • Gene expression • eQTL • Find association between expression and copy number • Circuit flow algorithm • Molecular interactions • Candidate causal genes • Causal gene selection • Weighted multiset cover sn C causal genes target Gene gm tag SNP sn TF-DNA phosphoryl. event protein- protein + - D causal genes cases
Target Gene Selection • Select a representative set of disease genes • Filter differentially expressed genes for each case • Multi-set cover Gene Expression Gene 1 Gene 2 Gene 3 . . . . . Controls Disease Cases
eQTL • Associations between the expression of target genes and copy number variations of genomic loci cases cases • Linear regression • For every pair of tag loci and target genes tag Loci target genes
Finding Candidate Causal Genes Target Genes Genotypic Variations
Finding Candidate Causal Genes Target Genes Genotypic Variations Candidate Genes C1 C2 C3 C4 ? C5
Finding Candidate Causal Genes Genotypic Variations Candidate Genes Interaction Network Target Genes C1 C2 C3 D C4 C5 protein-protein interactions phosphorylation events transcription factor interactions.
Finding Candidate Causal Genes Genotypic Variations Candidate Genes Interaction Network Target Genes C1 D C2 C3 C4 C5 u v + - Current flow Resistance (u, v) is set to be reversely proportional to (|corr (expr(u), expr(D))| + |corr(expr(v), expr(D))|)/2
Finding Candidate Causal Genes Genotypic Variations Candidate Genes Interaction Network Target Genes C1 D C2 C3 C4 C5 + - Current flow Compute the amount of current entering each causal gene by solving a system of linear equations
A B cases cases Method Outline g1 s1 s2 g2 tag loci g3 target genes s3 gm s4 • Target gene selection • Gene expression • eQTL • Find association between expression and copy number • Circuit flow algorithm • Molecular interactions • Candidate causal genes • Causal gene selection • Weighted multiset cover sn C causal genes target Gene gm tag SNP sn TF-DNA phosphoryl. event protein- protein + - D causal genes cases
Final Causal Gene Selection causal genes • A putative causal gene explains a disease case if • its corresponding tag locus has a copy number alteration • its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case cases
Final Causal Gene Selection causal genes • A putative causal gene explains a disease case if • its corresponding tag locus has a copy number alteration • its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case cases
Final Causal Gene Selection causal genes • A putative causal gene explains a disease case if • its corresponding tag locus has a copy number alteration • its affected target genes (i.e., genes sending a significant amount of current to the causal gene) are differentially expressed in the disease case WEIGHT cases
Final Causal Gene Selection • Find a smallest set of genes covering (almost) all cases at least k’ times minimum weighted multi-set cover
Dysregulated Pathways • Causal paths between a target and a causal gene • a maximum current path C1 C2 C3 C4 C5 D
Results • 701 candidate causal gene from circuit flow algorithm (STEP C) • 128 causal genes from set cover (STEP D)
Causal Genes • The selected causal gene set includes many known cancer implicated genes Functional analysis using DAVID BSOSC Review, November 2008
PTEN as causal gene fold change - 0 + TF TF-DNA protein- protein kinase causal genes
EGFR as causal and target gene TF Causal EGFR kinase causal genes fold change - 0 + phosphorylation TF-DNA protein- protein Target EGFR
Conclusion • A novel computational method to simultaneously identify causal genes and dys-regulated pathways • Circuit flow algorithm • Multi-set cover • Augmentation of eQTL evidence with interaction information resulted in a very powerful approach • uncover potential causal genes as well as intermediate nodes on molecular pathways • Our method can be applied to any disease system where genetic variations play a fundamental causal role
Acknowledgements • Teresa M. Przytycka • Stefan Wuchty • Other group members • Dong Yeon Cho • Yang Huang • Damian Wojtowicz • Jie Zheng
A B cases cases Method Outline g1 s1 s2 g2 tag loci g3 target genes s3 gm s4 • Target gene selection • Gene expression • eQTL • Find association between expression and copy number • Circuit flow algorithm • Molecular interactions • Candidate causal genes • Causal gene selection • Weighted multiset cover sn C causal genes target Gene gm tag SNP sn TF-DNA phosphoryl. event protein- protein + - D causal genes cases
EGFR as causal and target gene Causal Paths TF causal EGFR kinase causal genes phosphorylation TF-DNA protein- protein fold change - 0 + target EGFR
PTEN as causal gene Causal Paths fold change - 0 + TF TF-DNA protein- protein kinase causal genes
Our Method • Integrate several types of data • Gene expression • Copy number variations • Molecular interactions
Methods and Results • Method • model the expression change of disease genes as a function of genomic alterations • translated the propagation of information from a potential causal to a disease gene as the flow of electric current through a network of molecular interactions. • multi-set cover: select most prominent genes causal genes disease gene gm tag SNP sn + - • Validated our approach by testing the enrichment of selected causal genes with known GBM/Glioma related genes