1 / 16

INFOBIOMED Activity Report 12/02/2007

Mattia CF Prosperi Informa CRO Università degli studi “Roma TRE”. INFOBIOMED Activity Report 12/02/2007. Summary. HIV-1 tools Sub-typing Phylogenetic analysis BLAST classification and search Mutation extractor Various features Genotype/phenotype prediction Co-receptor usage prediction

jerrod
Download Presentation

INFOBIOMED Activity Report 12/02/2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mattia CF Prosperi Informa CRO Università degli studi “Roma TRE” INFOBIOMEDActivity Report12/02/2007

  2. Summary • HIV-1 tools • Sub-typing • Phylogenetic analysis • BLAST classification and search • Mutation extractor • Various features • Genotype/phenotype prediction • Co-receptor usage prediction • Replication capacity prediction • In-vivo therapy optimisation • Data base Integration • Dissemination • Perspectives

  3. Machine Learners • Multiple Linear Regressors (MLR) or Logistic • Linear functions are desiderable • Decision Trees (DT) • Understandable • Non-linear • Poor predictive power (better with boosting/bagging) • Support Vector Machines (SVM) • Linear and non-linear kernels • Good generalisation • Black box • Neural Networks • Universal approximators • Overtraining problems • Fuzzy Systems • ANFIS universal approx • Interpretability of rules • Problems with input space modelling

  4. Feature selection • Filter, Wrapper, Embedded • Filter: univariate analysis • Embedded: Trees (splits), RFE • Wrapper: optimises error on power set search • Wrapper searches: • Forward, backward, random, simulated annealing, simplex, genetic algorithms • Bubble selection: • new idea! • Accomplishes MDL

  5. Validation • Loss functions: • classification • Cross-entropy • Misclassification error • AUC • regression • RMSE • correlation • Multiple runs of k-fold cross validation • Construction of Gaussian error distribution • Model comparison with adj. Student’s t-test • Alternative: rank sum test on leave-one-out cross validation?

  6. In-vitro: genotype → phenotype • Multiple Linear Regression • SVD • Stepwise feature selection • Chi-squared filter • Fuzzy Regression • Fuzzy filter

  7. In-vitro: Replication Capacity • Univariate analysis (chi-square, wilcoxon) • Multivariate analysis (PCA, MLR) • Wrapper feature selection, embedded RFE over SVM • DT, IBR, SVM • Multiple X-valid.

  8. In-vitro: Co-Receptor • Method: local smoothing kernel (IBR) • k-Nearest Neighbour algorithm • BLAST-based distance function • Similarity weighted linear kernel • Nadaraya-Watson evaluation (weighted mean) • Multiple cross validation • Parameter optimisation through AUC maximisation

  9. In-vivo • Predicting the actual viral load changes following treatment switches is a challenging task. Both the individual variability of immune responses to infections and the large number of possible therapies add much noise to the system and make it quite complex. Other treatment-related factors such as pharmacokinetics and patient adherence to therapy play a crucial role in the control of virus replication and the development of resistance

  10. Standard Datum • Treatment switch constraints clinical trial-like: • Therapy switch drugs • Baseline RNA/CD4 [-40,+7] from start_date • Baseline sequence [-90,+15] from start_date (<= stop_date) • Follow up RNA/CD4 [+30,+90] from start_date (<=stop_date) • Notes: noise is a problem • There are “strange” mono-bi-therapies (like only one pi) • Some sequences are badly recorded • Still problems with RNA/CD4 “0” values • Data extraction lead to big data reduction

  11. Feature derivation • Phenotypes: calculated through MLR • Accumulation of resistant mutations • List of resistant-associated mutations taken from IAS/USA • Mutagenetic trees • Activity score • Function of compounds active in-vitro: now it’s simply the sum of phenotype values for drugs present in the therapy

  12. Optional attributes • Drug history: for each drug: • Total using time • Time since it’s not used • Exponential decreasing function • CD4/RNA ratio • Subtype, risk group, age, sex, ethnicity… • Sub-optimal treatment, naive status, therapy line • Some can reduce the data set size (or missing values must be handled)

  13. In-vivo: classification results • Boosted/Bagged Decision Trees, SVM, local smoothing kernel methods

  14. In-vivo: Fuzzy regression • Fuzzy relations to associate resistance/susceptibility factors between mutations and drugs • Fuzzy formulae to calculate combined treatments activities • Differential equations as de-fuzzifiers • Fuzzy feature selection

  15. Publications • Fuzzy modelling • “Evolutionary Fuzzy Modelling for Drug Resistant HIV-1 Treatment Optimisation” (book chapter accepted, Springer) • In-vivo modelling • “Statistical Comparison of Machine Learning Techniques for Treatment Optimisation of Drug-Resistant HIV-1” (submitted to IEEE-CBMS2007) • In-vitro Co-Receptor usage • “HIV-1 Coreceptor Usage Prediction via Indexed Local Kernel Smoothing Methods and Grid-Based Multiple Statistical Validation” (submitted to IEEE-CBMS2007)

  16. Perspectives • Complex networks • Macroscopic evolution (subtype) • Microscopic evolution (resistance mutations)

More Related