Reconstructing Gene Networks

Reconstructing Gene Networks • Presented by Andrew Darling • Based on article • “Research Towards Reconstruction of Gene Networks from Expression Data by Supervised Learning” • - Soinov, Krestyaninova, Brazma

Outline • Why study another microarray algorithm? • Background info • Methods • Results • Discussion • Conclusion

Why study another microarray algorithm? • Study of microarray data continues • Still unclear on what the data means • Still unclear on how the genome works • Confirm existing knowledge about gene networks using existing datasets • Proof of concept in a new algorithm using existing knowledge and datasets • This algorithm actually explains its reasoning

Background information • What is a gene network? • What is supervised learning? • What are decision trees / classifiers? • Why use classifiers?

What is a gene network? • A model of a genes affecting other genes • What other genes affect a given gene • How other genes affect a given gene • Positive, negative, complicated • Several model types – graphs, nodes, edges • Boolean ( on – off ) • Bayesian network ( conditional probability ) • Differential equations ( derivatives, integrals )

Gene network - example

What is supervised learning? • The paper was unclear on the subject • Perhaps a reference to the type of algorithm used • It may have involved human interaction with the software • Possibly, the software produced the classifiers in the form of a decision tree, then users interpreted the output into classification rules

What are decision trees / classifiers? • Acyclic directed graph - tree • Each graph explains what other genes affect a specific gene • Inner nodes are gene products of other genes • Edges are thresholds of concentration of the gene products of the other genes – rules of the tree • Leaf nodes are effects on transcription of the specific gene • Each graph is a classifier for a specific gene

Classifiers – model of gene networks • Expression of gene is function of transcription • Transcription of gene is in discrete states • Expressed more than average • Expressed less than average • Transcription state affected by amount of other gene products (expression of other genes) • Use yeast cell cycle data to test algorithm and previous knowledge to judge accuracy

Why use classifiers? • The products affecting a specific gene are listed in the tree • Allows for continuous values for concentrations • Each additional dataset refines the decision information • Decision trees are easy to read and interpret

Classifier - example

Methods • Use induction algorithm to generate decision trees • Program called C4.5 • Apply program three ways • Regulation of target gene as a function of other genes at same time (simultaneous) • Regulation of target gene as a function of other genes at previous times (time delay) • Regulation as a function of change of other genes (changes)

Results - given • These genes • and yeast datasets • Spellman, Cho, … • Cdc28 • Alpha-factor

Results – produced this

Results – with this accuracy

Discussion • Some concern about the accuracy between 70% and 94% on systems with known interactions • Does that imply that the microarray data is wrong or the algorithm is flawed?

Conclusions • Decision trees and classifiers seem a better way to explain gene expression • This paper did not do a good job of explaining how to make / use them • Reference to the algorithm itself was almost specious

Reconstructing Gene Networks