1 / 12

Predicting gene functions using hierarchical multi-label decision tree ensembles

Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat, Jan Struyf, Hendrik Blockeel, Dragi Kocev, Sa šo Džeroski. K.U.Leuven Department of Computer Science. Hierarchical Multi-Label Classification (HMC) for Gene Function Prediction.

Download Presentation

Predicting gene functions using hierarchical multi-label decision tree ensembles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting gene functions using hierarchical multi-label decision tree ensembles Celine Vens, Leander Schietgat, Jan Struyf, Hendrik Blockeel,Dragi Kocev, Sašo Džeroski K.U.LeuvenDepartment of Computer Science

  2. Hierarchical Multi-Label Classification (HMC) for Gene Function Prediction • Classification: a common machine learning task e.g., • Given: genes with known function • Task: predict function for new genes • Special case: hierarchical multi-label classification (HMC) • gene can have multiple functions • functions are organized in a hierarchy • tree (e.g., MIPS FunCat) • DAG (e.g., Gene Ontology) Hierarchy constraint:if gene is labeled with function X, thenit is also labeled with all parents of X K.U.LeuvenDepartment of Computer Science

  3. Predictions in Functional Genomics • S. cerevisiae (13 datasets) and A. thaliana (12 datasets) • two of biology’s model organisms • most genes are annotated, ideal for testing purposes • method can be applied to other organisms • Data • based on sequence statistics, phenotype, secondary structure, homology, microarray data,…

  4. Predictive Clustering Trees • Our focus is on decision trees • Advantages: fast to build, noise-resistant, fast to apply, accurate predictions, easy to interpret, … • General framework: predictive clustering trees (PCTs) Input Algorithm Output PCT-algo Name A1 A2 … An 1 … 5 5/1 … 40 40/3 40/16 … G1 … … … … x x x x x G2 … … … … x x x x G3 … … … … x x G4 … … … … x x x G5 … … … … x x x G6 … … … … x x x … … … … … … … … … … … … … … … … top-down induction of PCTs PCT genes with features and known functions

  5. Decision Trees for HMC: Different Approaches Our approach learns one single tree for all classes Special-purpose approach learns one tree per class + hierarchy constraint Standard approach learns one tree per class

  6. Test set Clus-HMC L1 1 Clus-HMC L2 2 3 L Clus-HMC L3 combined prediction n … … Clus-HMC Ln 50 PCTs Predictive Clustering Forests • Ensembles • Less interpretability • Better performance • Algorithm: Clus-HMC-Ens 50 predictions Training set 50 bootstrap replicates

  7. Decision Trees for HMC: Different Approaches Our approach learns one single tree for all classes Variant of our approach learns forest Special-purpose approach learns one tree per class + hierarchy constraint Standard approach learns one tree per class

  8. Evaluation • Evaluation: precision-recall • precision: percentage of predicted functions that are correct (TP/(TP+FP)) • recall: percentage of actual functions predicted by the algorithm (TP/(TP+FN)) • Average PR curve • Consider (instance,class) couples • Couple is (predicted) true if instance (is predicted to have) has class

  9. S. cerevisiae-FunCat (hom) A. thaliana-GO (seq) S. cerevisiae-FunCat (expr) A. thaliana-GO (interpro) • Clus-HMC-Ens better than Clus-HMC (average AUC improvement of 7%) • Clus-HMC better than C4.5H (state-of-the-art system for HMC) (for the same recall of C4.5H, average precision improvement of 20.9%)

  10. Comparison with SVMs (Barutcuoglu et al.) • Learn SVM per class • Correct for HC violations with bayesian model

  11. Conclusions • Clus-HMC outperforms (or is comparable to) state-of-the-art methods on functional genomics tasks • Ensembles of Clus-HMC are able to boost performance, if the user is willing to give up on interpretability • “Revenge of the decision trees”

More Related