130 likes | 275 Views
Learning Bayesian Networks with microarray data. Goal: use well known Bayesian network learning algorithms to analyze microarray data. Challenge in microarray data analysis techniques. Prior techniques (clustering, PCA, SVM): Group together genes with similar expression patterns
E N D
Learning Bayesian Networks with microarray data Goal: use well known Bayesian network learning algorithms to analyze microarray data
Challenge in microarray data analysis techniques • Prior techniques (clustering, PCA, SVM): • Group together genes with similar expression patterns • Do not reveal structural relations between genes • The challenge: • Extract meaningful information from the expression data • Discover interaction between genes based on the measurements
Sample classification Disease diagnosis Gene-gene relation analysis Activation or inhibition Expression Profiles Gene Regulatory network analysis Constructed bayesian network Global view on the relations among genes Use of Bayesian networks in microarray data analysis
Bayesian networks: a short example Clean spark plug Fuel Fuel meter start Evidence: my car does not start. Reasoning: now fuel and dirty spark plugs become more certain, therefore the certainty of the fuel meter standing for empty also increases.
Bayesian networks: a short example The bayesian directed acyclic graph actually describes the joint probability of P(X1,X2,…,Xn): P(X) = П P(Xi|Pa(Xi)) n i=1 Where Pa(Xi) are the parents of node Xi P(FMS|F) Fuel Fuel meter standing
Learning the gene network with Bayesian methods • Deals with noisy data • Have good statistical foundation • Compact and intuitive representation • The total possible DAGs with 10 nodes is 4.2 * 10^18 • # samples << #features in microarray experiments • Acyclic
Already achieved by using advanced algorithms Friedman used a specialized learning method (SCA), permuted the dataset to learn 200 networks and selected some special features from these networks to create a final network. • Dominant genes • Functionally related pairs • Clusters of dominated genes
My results using less advanced methods: Reproducing Page • data set: 74 myeloma samples and 31 healthy samples (affy) • genes selected and discretize on basis of entropy (info gain) • Learned ‘markov blanket’ to classify examples is a naïve bayesian • 100% score • Only 15 out of 30 genes needed Problem is that we compare ill VS healthy: big difference
My results : Van ‘t Veer experiment • 70 metastases predicting genes in breast cancer samples found by van ‘t Veer are used to learn a network • two networks are learned: • Markov blanket to classify: only 16 of 70 genes score 95% correct (van ‘t veer scores 84% !) • PDAG: ‘Interesting’ global network but significance is not clear.
Further plans • Use other bayesian network learners and try to discover the significance and robustness of the resulting networks • Discretization methods have a large influence on the resulting network: try different methods • Gene selection method : Use prior knowledge to select a group of genes (pathways)
Conclusion Experiment for a few more months!