170 likes | 261 Views
WP4 - Data interoperability and management. Integrated web portal for HIV-1 pol gene analysis. R. Ricci, M. Zazzi, M. Prosperi (Informa) V. Maojo, G. de la Calle Velasco (UPM). INFOBIOMED portal for HIV-1 pol gene analysis. General objectives
E N D
WP4 - Data interoperability and management Integrated web portal for HIV-1 pol gene analysis R. Ricci, M. Zazzi, M. Prosperi (Informa) V. Maojo, G. de la Calle Velasco (UPM)
INFOBIOMED portal for HIV-1 pol gene analysis • General objectives • Realise a web portal specialised on HIV pol gene analysis, providing freely accessible tools for (at least): • HIV-1 subtyping and characterization of recombinant forms • in vitro drug susceptibility prediction • collection of HIV-1 treatment response clinical records and in-vivo treatment optimisation • Integration of HIV-1 data bases from local Hospital centres • The portal will be accessible through Infobiomed official website, as well as from the ARCA website (https://www.hivarca.net/Pubblica/Index.asp) and from various academic institutions • Who may be interested • Scientists interested in HIV evolution and/or HIV drug resistance and willing to share data contained in well-organized databases in a single repository accessible on the web • Planned intra-NoE collaborations • UPM • design of a “metaDB”and implementation of an automatic tool for the generation of the appropriate relational schema between source and destination databases • Custodix? • sensitive data anonymisation where required
Current status and collaborations • Implementation status • Subtyping tool was implemented, web interfaced and is being tested by expertised audience through different mirror sites (beta version) • Recombination tool was implemented, web interfaced and is being tested by expertised audience through different mirror sites (beta version) • In-vitro drug susceptibility tool (genotype/phenotype predictor) was implemented, web interfaced, cross-validated and tested. It’s continuously updated (full operative version) • collection of HIV-1 treatment response clinical records is continuously carried on, through the active participation of local Italian clinical centres, with the data hosted in the ARCA site • In-vivo treatment optimisation: models mined from data through different Statistical Learning techniques • Two papers submitted • Web application still under planning • Batch tools for automated mutation extraction from consensuses with quality assessment on sequences implemented (frameshifts handling, ambiguities) integrated in the data base sequence recording
HIV-1 pol subtypes, recombinant and URFs tools and repository /1 • Objectives • Assign an input pol sequence to one of HIV subtypes or circulating recombinant forms (CRFs) • In case of insufficient similarity to any CRF, evaluate its mosaic structure and compare to other available unique recombinant forms (URFs; note that a new CRF is defined when the same URF is repeatedly found in at least 3 non-linked subjects) • Features • Fast subtyping using BLAST on consensus sequences + CRFs • BLAST search for new subtypes, CRFs or URFs on • ARCA pol DB • Los Alamos HIV-1 pol data base • GenBank • Real Time analyses, wrapped enviroment
HIV-1 pol subtypes, recombinant and URFs tools and repository /2 • Features (cont.) • Phylogenetic Analysis of HIV-1 pol sequences: • Multiple alignment (CLUSTALW) with consensus sequences • Neighbour-Joining tree (100 bootstrap) • Maximum Likelihood tree (with phylogenetic signal detection through TREEPUZZLE) • Bootscan for recombination detection • Graphical exportable output (txt, pdf, jpg) of trees, phylogenetic signals and recombination plots • Analyses carried in real time through a user friendly interface
HIV-1 pol subtypes, recombinant and URFs tools and repository /3 • HIV-1 subtyping tool home page • BLAST • PHYLIP • TreePuzzle
HIV-1 pol subtypes, recombinant and URFs tools and repository /4 • BLAST analysis on different data bases • Fast subtyping • URF detection • CRF detection
HIV-1 pol subtypes, recombinant and URFs tools and repository /5 • Phylogenetic analysis • PHYLIP • Quality assessment with bootstrap • TreePuzzle • Quartet puzzling • Easy portable output formats
HIV-1 pol subtypes, recombinant and URFs tools and repository /6 • Recombinant detection tool • Sliding window sequence scan
HIV-1 in-vitro drug susceptibility prediction /1 • Objectives • Predict HIV reverse transcriptase and protease drug susceptibility in vitro - expressed as fold-resistance (IC50 of the input isolate divided by IC50 of the reference wild type isolate) - based on an input HIV pol sequence • Features • Batch procedure implemented in the data base sequence recording: • Alignment with wild type consensus reference sequence (global and local) • Frameshifts, stop codons and ambiguous positions handling • Extraction of mutations from wild type
HIV-1 in-vitro drug susceptibility prediction /2 • Features (cont.) • Genotype/phenotype prediction • Updated with the newest drugs • Training and validation on large data sets (Stanford public and our own clinical practice, usually >=1000 genotype/phenotype pairs for each drug) • Linear Regression Model (simple, robust and still challenging with complex models like SVMs)
HIV-1 in-vitro drug susceptibility prediction /3 • Features (cont.) • Genotype/phenotype prediction • Statistical analyses to find relevant mutations (univariate chi-squared, multivariate) • Stepwise, Lasso, Ridge shrinkage feature selection functions explored • Final model has few variables and maintains highest predictive power (with 10-fold cross validation) • Real valued Fold Change prediction, summary of biological and clinical cutoffs
HIV-1 in-vitro drug susceptibility prediction /4 • MLR genotype/phenotype currently hosted in the ARCA site
HIV treatment response repository with treatment optimisation tool • Objectives • Collect clinical records – from clinical trials or observational cohorts - including HIV genotype and viral load as well as patient CD4 counts at the time of a treatment switch or initiation coupled with follow-up data during continued therapy with that regimen • Features • Query sequence handling as in geno/pheno prediction • “Quasi” Clinical-Trial view on ARCA db • Collection of treatment change episodes with • Baseline RNA, CD4, genotype, HAART… • 12-weeks follow up • Comparison of user query with the view, collection of similar cases • Follow up prediction for the query • Case Based Reasoning, K-nearest neighbour algorithm, kernel methods, local optimization, feature selection • Current status • Two papers submitted • Implementation under planning
Integration of HIV-1 databases from local hospital centres /1 • Objectives • Realise a web-based or downloadable wizard-like tool to facilitate export of data between a user’s database and the internal database used to implement the functions available on the HIV pol analysis portal • Define a standard (possibly based on ARCA) • Type of data to be handled • HIV pol sequences likely to be recombinants, particularly if accompanying information is available (e. g. country of origin, year of detection). • HIV pol sequences matched with in vitro drug susceptibility data measured by a published or documentable assay • HIV pol sequences coupled with treatment used and follow-up data (HIV RNA load and/or CD4 counts)
Integration of HIV-1 databases from local hospital centres /2 • Features • Automatically handle: • Heterogeneity conflicts • Semantic conflicts • Descriptive conflicts • Structural conflicts • Open problems • Heterogeneity of data sources • Different schemas, technology… • Heterogeneous data • Need to maintain local autonomy and preferences • Ethical and security issues • Privacy, security • Anonymization of sensitive data
Integration of HIV-1 databases from local hospital centres /3 • Current status • Meeting held in Madrid (June ’06) with UPM • Discussion on-going about: • Federated vs centralized approach • Meta-Data Standards • Source quality