260 likes | 406 Views
Using Bayesian Networks to Analyze Expression Data. N . Friedman, M. Linial , I. Nachman , D. Pe’er @ Hebrew University. What I will cover. Domain background Overview of their work Causal networks vs. Bayes networks Application Results. Background information.
E N D
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Pe’er @ Hebrew University
What I will cover • Domain background • Overview of their work • Causal networks vs. Bayes networks • Application • Results
What are gene expressions? • It is the process in which information is used in the synthesis of a functional gene product (protein or Rna). • Think of it as a menu for a dinner given a certain holiday. • Need certain ingredients / food to pull it off right. • Too much or too little of something can lead to odd results.
Advancement in technology lead to DNA Microarrays. • Snapshot of internals of a cell at a given moment in time. • No more having to look at one gene at a time for comparison. • Most computational analysis has focused on clustering algorithms. • Cluster like genes with like genes. • Useful for finding co-regulated genes but not really for finding the structure of the regulation process.
Overview • How to discover key relations in cellular systems given large amounts of micro array data. • Propose a Bayesian Network framework for gene interaction discovery from micro array data. • Trying to build statistical dependencies. • Understand interactions from multiple expression measurements.
Overview • Want to uncover properties of the network by examining the dependence and conditional dependence of the gene data. • How does one gene interact with another etc. • Can use this information to determine causal influence.
Bayesian Network • Useful for a few reasons • Great for describing locally interacting entities. • Well understood array of algorithms and successful use in many areas. • Can be used to infer a causal network even though they are not mathematically defined as such. • Able to handle noise fairly well.
Causal Network • Very similar to a typical Bayesian net. • Bayesian network with a strict requirement that the relationships are causal. • X causes something about Y. • Learning multiple networks with the same directed path could mean there is a causal indication between X and Y.
Bayesvs Causal • Bayesian Network generally deals with dependence. • Causal Networks deal with strict relationships. • Bayesian Network can have equivalent networks. • X Y is equivalent to Y X • Causal Network • The above cannot hold due to the definition of Causal networks.
Learning Causal Patterns • Need to determine a causal interpretation of the network. • Observation • Passive domain measurement. • Intervention • Setting variable values using outside forces.
Causal Markov assumption • Given the values of a variables immediate causes, it is independent of its earlier causes. • Once we know the makeup of the genes parents, we don’t care about the ancestors anymore in terms of the current gene.
Analyzing Expression Data • Consider distributions over all possible states ( can include environmental states etc) • State of the system is a series of random variables. • Each random variable denotes expression level of each gene. • Take all of these variables and build the joint distribution.
Difficult to learn from expression data due to involving transcript levels from thousands of genes! • However these gene networks are sparse so Bayes Nets are still well suited.
Learning the model • Markov relations are a feature that indicates if two genes are related in a joint biological process. • Order relations are a feature that captures a global property about the network. • Used as an indication of some causality between X and Y. Its not certain though.
Confidence of features • Produce m different networks and for each feature of interest calculate its confidence. • Where f(G) is 1 if f is a feature of G, 0 otherwise.
Learning the network structure • Issues • Extremely large search space (super-exponential in the number of variables) • Need to id potential parents for each gene using simple statistics to build the network. • Reduces search space to networks that only contain the candidate parents as parents of some variables Xi .
Different local probability models • Multinomial Model • Treat each variable as discrete and learn multinomial distribution to describe the possible state of each child given the stat of the parents. • Linear Gaussian Model • Linear regression model for the child given its parents.
Results • Applied Cell Cycle Expression patterns. • 76 gene expression measurements. • Treat each measurement as an independent sample. • Performed the boot strapping algorithm along with the sparse search algorithm to extract learned features. • Performed on only 250 genes
Test robustness • Tested their confidence assessment by using a randomly created data set. Random permutation of the order of experiments per gene. • Found that random data did not perform well due to not finding real features that correspond in the data. • Tells us that the learned features are not artifacts of the boot strapping estimation.
Managed to extract plausible biological knowledge without use of priors. • Framework builds a much “richer” structure from the data compared to clustering techniques. • Capable of discovering causal relationships between genes from expression data.