160 likes | 283 Views
Mattia CF Prosperi Informa CRO Università degli studi “Roma TRE”. INFOBIOMED Activity Report 12/02/2007. Summary. HIV-1 tools Sub-typing Phylogenetic analysis BLAST classification and search Mutation extractor Various features Genotype/phenotype prediction Co-receptor usage prediction
E N D
Mattia CF Prosperi Informa CRO Università degli studi “Roma TRE” INFOBIOMEDActivity Report12/02/2007
Summary • HIV-1 tools • Sub-typing • Phylogenetic analysis • BLAST classification and search • Mutation extractor • Various features • Genotype/phenotype prediction • Co-receptor usage prediction • Replication capacity prediction • In-vivo therapy optimisation • Data base Integration • Dissemination • Perspectives
Machine Learners • Multiple Linear Regressors (MLR) or Logistic • Linear functions are desiderable • Decision Trees (DT) • Understandable • Non-linear • Poor predictive power (better with boosting/bagging) • Support Vector Machines (SVM) • Linear and non-linear kernels • Good generalisation • Black box • Neural Networks • Universal approximators • Overtraining problems • Fuzzy Systems • ANFIS universal approx • Interpretability of rules • Problems with input space modelling
Feature selection • Filter, Wrapper, Embedded • Filter: univariate analysis • Embedded: Trees (splits), RFE • Wrapper: optimises error on power set search • Wrapper searches: • Forward, backward, random, simulated annealing, simplex, genetic algorithms • Bubble selection: • new idea! • Accomplishes MDL
Validation • Loss functions: • classification • Cross-entropy • Misclassification error • AUC • regression • RMSE • correlation • Multiple runs of k-fold cross validation • Construction of Gaussian error distribution • Model comparison with adj. Student’s t-test • Alternative: rank sum test on leave-one-out cross validation?
In-vitro: genotype → phenotype • Multiple Linear Regression • SVD • Stepwise feature selection • Chi-squared filter • Fuzzy Regression • Fuzzy filter
In-vitro: Replication Capacity • Univariate analysis (chi-square, wilcoxon) • Multivariate analysis (PCA, MLR) • Wrapper feature selection, embedded RFE over SVM • DT, IBR, SVM • Multiple X-valid.
In-vitro: Co-Receptor • Method: local smoothing kernel (IBR) • k-Nearest Neighbour algorithm • BLAST-based distance function • Similarity weighted linear kernel • Nadaraya-Watson evaluation (weighted mean) • Multiple cross validation • Parameter optimisation through AUC maximisation
In-vivo • Predicting the actual viral load changes following treatment switches is a challenging task. Both the individual variability of immune responses to infections and the large number of possible therapies add much noise to the system and make it quite complex. Other treatment-related factors such as pharmacokinetics and patient adherence to therapy play a crucial role in the control of virus replication and the development of resistance
Standard Datum • Treatment switch constraints clinical trial-like: • Therapy switch drugs • Baseline RNA/CD4 [-40,+7] from start_date • Baseline sequence [-90,+15] from start_date (<= stop_date) • Follow up RNA/CD4 [+30,+90] from start_date (<=stop_date) • Notes: noise is a problem • There are “strange” mono-bi-therapies (like only one pi) • Some sequences are badly recorded • Still problems with RNA/CD4 “0” values • Data extraction lead to big data reduction
Feature derivation • Phenotypes: calculated through MLR • Accumulation of resistant mutations • List of resistant-associated mutations taken from IAS/USA • Mutagenetic trees • Activity score • Function of compounds active in-vitro: now it’s simply the sum of phenotype values for drugs present in the therapy
Optional attributes • Drug history: for each drug: • Total using time • Time since it’s not used • Exponential decreasing function • CD4/RNA ratio • Subtype, risk group, age, sex, ethnicity… • Sub-optimal treatment, naive status, therapy line • Some can reduce the data set size (or missing values must be handled)
In-vivo: classification results • Boosted/Bagged Decision Trees, SVM, local smoothing kernel methods
In-vivo: Fuzzy regression • Fuzzy relations to associate resistance/susceptibility factors between mutations and drugs • Fuzzy formulae to calculate combined treatments activities • Differential equations as de-fuzzifiers • Fuzzy feature selection
Publications • Fuzzy modelling • “Evolutionary Fuzzy Modelling for Drug Resistant HIV-1 Treatment Optimisation” (book chapter accepted, Springer) • In-vivo modelling • “Statistical Comparison of Machine Learning Techniques for Treatment Optimisation of Drug-Resistant HIV-1” (submitted to IEEE-CBMS2007) • In-vitro Co-Receptor usage • “HIV-1 Coreceptor Usage Prediction via Indexed Local Kernel Smoothing Methods and Grid-Based Multiple Statistical Validation” (submitted to IEEE-CBMS2007)
Perspectives • Complex networks • Macroscopic evolution (subtype) • Microscopic evolution (resistance mutations)