130 likes | 285 Views
DREAM4 Puzzle – inferring network structure from microarray data. Qiong Cheng. Outline. Gene Network Gene Regulatory Systems and Related Work FunGen: Reconstructing Biological Networks Using Conditional Correlation Analysis ARACNE: Algorithm for Reconstructing Accurate Cellular Network.
E N D
DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng
Outline • Gene Network • Gene Regulatory Systems and Related Work • FunGen: Reconstructing Biological Networks Using Conditional Correlation Analysis • ARACNE: Algorithm for Reconstructing Accurate Cellular Network
Gene Network • Directed network • nodes : genes • edges : regulation • including loops • Scale-free: • Degree distribution: • power law P(k) ~ k-λ
Genetic Network Generation Schematic Jong Modeling and simulation of genetic regulatory systems: a literature review. J. Comput Biol 2002;9(1):67-103
Random Network Model • ER model • each pair of nodes connected by an edge with probability p • Independence of the edges • poisson degree distribution (e.g. P(k) ~ e-k for k) • BA model • Scale-free distribution ( P(k) ~ k-x ) • Process: new nodes prefer attached to already high degree nodes http://arxiv.org/pdf/cond-mat/0010278
Random Network Model • Module extraction from source random scale-free network (used by DREAM3) • Hierarchical scale-free network • Extraction: Random seed node + iteratively adding neighbor nodes with highest modularity Q Marbach D, Schaffter T, Mattiussi C, and Floreano D (2009) Generating Realistic in silico Gene Networks for Performance Assessment of Reverse Engineering Methods. J Comput Biol, 16(2):229–239
Microarray Data Distributions • Benford’s law ( in base 10): P(D)=log10(1+D-1) • Zipf’s law: microarray data log-normal distribution as a potential distribution for normalization of the bulk of the corrected spot intensities • Noise Source: “Make Sense Of Microarray Data Distributions”
Reverse Engineering • Clustering + … • Correlation measures + … • Optimization method • Bayesian network (conditional independence via DAG) • Markov chains • Dynamic Bayesian network • Expectation maximization (max likelihood) • GA • Neuron network • Simulation • Piecewise-linear differential equations • Stochastic equations • Stochastic/hybrid petri-net • Boolean network • Regression techniques
FunGen : Reconstructing Biological Networks Using Conditional Correlation Analysis • Synthetic network • Network dynamics • Simulation protocol - perturbation • Conditional correlation • Correlation is symetric • Matrix is non-symetric • May lead to indirect connection • False positive (indirect connection) + false negative (noise) • error = FP/(FP+TN) + FN/(FN+TP) • Reduce false positive • Choose optimal ρ_opt • Triangle reduction construction
ARACNE: Algorithm for Reconstructing Accurate Cellular Network • Assume two-way interaction: pairwise potential determines all statistical dependencies + uniform marginal distributions • Mutual information (MI) = measure of relatedness • Independency • Data processing inequality: if genes g1 and g3 interact through g2 then • ARACNE starts with network so for every edge look at gene triplets and remove edge with smallest MI • Ignore the direction of the edges • Reconstruct tree-network topologies exactly • higher-order potential interactions will not be accounted for (ARACNE’s algorithm will open 3-gene loops). • A two-gene interaction will be detected iff there are no alternate paths.
ARACNE – Example & Evaluation • Example: • Synthetic networks: ER , BA • Performance to be assessed via Precision-Recall curves (PRCs)
(Demo) Sample input data file Input_file_name.exp N = 3 # genes M = 2 # microarrays Input file has N+1=4 lines each lines has M+2 (2M+2) fields AffyID HG_U95Av2 SudHL6.CHP ST486.CHP G1 G1 16.477367 0.69939363 20.150969 0.5297595 G2 G2 7.6989274 0.55935365 26.04019 0.5445875 G3 G3 8.8098955 0.5445875 21.554955 0.31372303 Microarray chip names annotation name header line (value,p-value)-chip1 Source from ARACNE slides
(Demo, cont’d) Sample output data file 5 AffyID ID# MI value Associated gene ID# input_data_file_name[non-default_param_vals].adj # lines = N = # genes G1:0 8 0.064729 G2:1 2 0.0298643 7 0.0521425 G3:2 1 0.0298643 G4:3 8 0.0427217 G5:4 5 0.403516 G6:5 4 0.403516 6 0.582265 G7:6 5 0.582265 9 0.38039 G8:7 1 0.0521425 8 0.743262 G9:8 0 0.064729 3 0.0427217 7 0.743262 9 0.333104 G10:9 6 0.38039 8 0.333104 4 1 6 9 7 8 10 2 3 Source from ARACNE slides