290 likes | 517 Views
RECOMB SATELLITE MEETING NEW-YORK, NOVEMBER 2010. GENIE – GEne Network Inference with Ensemble of trees. Van Anh Huynh-Thu Department of Electrical Engineering and Computer Science, Systems and Modeling, University of Liege, Belgium. Inference of GRNs.
E N D
GENIE – GEne Network Inference with Ensemble of trees Van Anh Huynh-Thu Department of Electrical Engineering and Computer Science, Systems and Modeling, University of Liege, Belgium
Inference of GRNs • Gene regulatory networks (GRNs) are behind the scene players in gene expression • How do we determine the regulators of each gene? • Input: • Gene expression data in different conditions/time points • A subset of the genes that contains all the regulators (without GENIE accuracy plummets)
Underlying Model • Every reverse engineering tool assumes an underlying model • GENIE assume that the GRN is a Boolean network • Therefore, the regulation of each gene is a Boolean function
GENIE Strategy Outline • Not to make strong assumptions about the possible regulatory interactions (e.g. a strong assumption is linearity) • Treat time-series as static experiments • Solve the problem for each gene separately, and combine the results • The final output is a ranking of potential interactions in descending confidence
Tree-based Ensemble Methods • A regulation function is a binary tree – at each node a binary test according to a different regulator is performed • The prediction is at the leaf • For each gene, randomly select a set of samples and produce a tree from each one (the root is the single gene that splits K random conditions of the target best, and so on) • Rank the regulators according to their importance in the trees
Ranking of regulators #S is the number of samples that reach the node N #St (Sf) is the number of samples with output true (false) Var() is the variance of the output In order to avoid bias towards highly variable genes, the expression values are first normalized to unit variance
The Genetic Landscape of the Cell Charles Boone University of Toronto, Donnelly Center
Synthetic Genetic Arrays • Single mutant strand (query gene) is crossed with all other single mutants • Double mutants are selected • Currently done for budding yeast, e.coli and s.pombe No growth
Genetic Interactions • Positive interaction: The double knockout is more viable than would be expected by the separate contributions of the single knockouts • Negative interaction: The double knockout is less viable than would be expected by the separate contributions of the single knockouts • They crossed ~1700 yeast single mutants with ~3,800 single mutants, and after filtering failures they got ~5.4 million double mutants
Yeast Interaction Map Edges are interactions that pass cutoff threshold (170,000) Proximity in the layout is according to similarity in interaction profiles Colored sets = GO enrichment
Proximity between clusters and related functions Proximate clusters Both require cytoskeleton genes
Zoom in on pathway Required for polarization and growth Cell division Red – Negative Green - Positive Translation Budding Interactions between pathways and complexes were often monochromatic
Positive vs. negative interactions No interaction Negative interactions are ~two times more prominent than positive
Degree distribution Hubs are less numerous Severe fitness defects in single mutants correlate with degree
Correlation between degree and gene properties Black - PPI # morphological phenotypes # chemical perturbations unstable structure
Genetic interactions between cellular processes Cell cycle is more buffered?
Hubs in the chemical interaction networks match hubs in GI network Single mutant + chemical = chemical interaction Hydroxyurea blocks DNA synthesis Erodoxin (new) similar to protein Folding-related gene DNA repair
Discovering Master Regulators of Alcohol Addiction William Shin Center for Computational Biology and Bioinformatics Columbia University
Rat Model of Alcohol Addiction Control Alcohol Self Administration Alcohol Vapor Treatment (Chronic alcohol addiction) No Alcohol Vapor Non Dependent Dependent Control
100 * 75 Alcohol responding (0.5 hr) 50 Baseline Non-dependent 25 (exposed to air) Dependent (exposed to alcohol vapor) 0 Rat model of alcohol addiction Alcohol self-administration (lever pressing) Induction of alcohol-dependence Alcohol Intake during early withdrawal
Identification of TF-target interactions • Rat Brain regions were sliced and used as microarray samples • 92 samples from Dependent, Non-Dependent, Control Rats across 8 regions that are known as sites-of-action for of addictive drugs. • Applied ARACNE to this data • Information-theory based (MI) • Tests triplets of genes for indirect interactions • 130,000 TF-target interactions in total
Screening of false positives Targets of TF1 TF1 TF1 shadows TF2: TF2 appears enriched only because it shares common targets with TF1 Targets of TF2 TF2 The master regulators are enriched TFs not shadowed by any other
siRNA validation has 50-75% success rate Not all targets were tested yet