230 likes | 360 Views
Simulation and Application on learning gene causal relationships. Xin Zhang. Introduction. High-throughput genetic technologies empowers to study how genes interact with each other ; Simulation to evaluate how well IC algorithm learns gene causal relationships;
E N D
Simulation and Application on learning gene causal relationships Xin Zhang
Introduction • High-throughput genetic technologies empowers to study how genes interact with each other; • Simulation to evaluate how well IC algorithm learns gene causal relationships; • We present an algorithm (mIC algorithm) for learning causal relationship with knowledge of topological ordering information, and apply it on Melanoma dataset; • Apply mIC algorithm on Melanoma dataset;
Steps for Simulation Study • Construct a causal networkN; • Generate datasets based on the causal network; • Learning the simulated data using causal algorithms (e.g. IC algorithm) to obtain network N´; • Compare the original network N with obtained network N´ w.r.t precision and recall;
A B f C C=f(A,B) Modeling and simulation of a causal Boolean network (BN) • Boolean network: • Constructing a causal structure; • Assign parameters (proper functions) for each node with casual parents; • Assign probability distribution;
Constructing Boolean Network 1. Generate M BNs with up to 3 causal parents for each node; 2. For each BN, generate a random proper function for each node; 3. Assign random probabilities for the root gene(s); 4. Given one configuration, get probability distribution; 5. Collect 200 data points for each network; 6. Repeat above steps 3-5 for all M networks.
A B C D E Constructing Causal Structure
Proper function (1) Proper function: The function that reflects the influence of the operators. Example: By simplifying f, c is a function of a with c = a b is a pseudo predictor of c, and has no effect on c. f is not a proper function.
Proper function (2) • Definition: • With n predictors, the number of proper function is given by:
Steps of learning gene causal relationships • Step1: obtain the probability distribution and data sampling; • Step2: apply algorithms to find causal relations; • Step3: compare the original and obtained networks based on the two notions of precision and recall; • Step4: repeat step 1-3 for every random network;
Comparing two networks A B A B C D C D Original Network Obtained Network
Precision and Recall • Original graph is a DAG, while obtained graph has both directed and undirected edges; Recall = ATP/(AFN+ATP), Precision = ATP/(ATP + AFP)
A B A B C D C D Observational equivalence and Transitive Closure • Two DAGs are said to be observational equivalent (OE) if they have the same skeleton and the same set of v-structure; OE Transitive closure (TC): A ->B -> C with A -> C cc(x,y): is true if there is a directed or an undirected edge from x to y; pcc(x,y): is true if there is a path from x to y consisting of properly directed and undirected edges pcc(x,y):= cc(x,y) | pcc(x,z) pcc(z,y)
How to improve IC algorithm • The original IC algorithm did not have good results on learning gene causal relationships; • A possible way to improve the performance is to incorporate extra information; • If we know the topological ordering of the regulatory network, it would be helpful to improve the learning result;
Gene topological ordering • If a specific gene is the causal parent of another gene; • In a pathway, if one gene appears before another gene; • If one gene is at the beginning or at the end of the pathway; IC algorithm+topological ordering information
mIC algorithm • mIC algorithm based on IC, but incorporates both topological ordering information with steady state data to infer causality; • 3 Steps of mIC algorithm: • Find conditional independence: For each pair of gene gi and gj in a dataset, test pairwise conditional independence. If they are dependent, search for a set Sij = {gk | gi and gj are independent given gk, with i<k<j, or j<k<i}. Construct an undirected graph G such that gi and gj are connected with an edge if an only if they are pairwise dependent and no Sij can be found; • Find v-structure: For each pair of nonadjacent genes gi and gj with common neighbor gk, if gkSij, and k>i, k>j, add arrowheads pointing at gk, such as gi ->gk <- gj; • Orientate more directed edges according to rules: Orientate the undirected edges without creating new cycles and v-structures;
Melanoma dataset • The 10 genes involved in this study chosen from 587 genes from the melonoma data; • Previous studies show that WNT5A has been identified as a gene of interest involved in melanoma; • Controlling the influence of WNT5A in the regulation can reduce the chance of melanoma metastasizing;
Pirin causatively influences WNT5A – In order to maintain the level of WNT5A we need to directly control WNT5A or through pirin. WNT5A WNT5A directly causes MART-1 Applying mIC algorithm on Melanoma Dataset Partial biological prior knowledge: MMP3 is expected to be the end of the pathway
Conclusion • Evaluated IC algorithm using simulation data; • We presented mIC algorithm that can infer gene causal relationship from steady state data with gene topological ordering information; • Performed simulation based on Boolean network to evaluate the performance of the causal algorithms; • We applied mIC algorithm to real biological microarray data Melanoma dataset; • The result showed that some of the important causal relationships associated with WNT5A gene have been identified using mIC algorithm.