1 / 17

Integrated web portal for HIV-1 pol gene analysis

WP4 - Data interoperability and management. Integrated web portal for HIV-1 pol gene analysis. R. Ricci, M. Zazzi, M. Prosperi (Informa) V. Maojo, G. de la Calle Velasco (UPM). INFOBIOMED portal for HIV-1 pol gene analysis. General objectives

orly
Download Presentation

Integrated web portal for HIV-1 pol gene analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP4 - Data interoperability and management Integrated web portal for HIV-1 pol gene analysis R. Ricci, M. Zazzi, M. Prosperi (Informa) V. Maojo, G. de la Calle Velasco (UPM)

  2. INFOBIOMED portal for HIV-1 pol gene analysis • General objectives • Realise a web portal specialised on HIV pol gene analysis, providing freely accessible tools for (at least): • HIV-1 subtyping and characterization of recombinant forms • in vitro drug susceptibility prediction • collection of HIV-1 treatment response clinical records and in-vivo treatment optimisation • Integration of HIV-1 data bases from local Hospital centres • The portal will be accessible through Infobiomed official website, as well as from the ARCA website (https://www.hivarca.net/Pubblica/Index.asp) and from various academic institutions • Who may be interested • Scientists interested in HIV evolution and/or HIV drug resistance and willing to share data contained in well-organized databases in a single repository accessible on the web • Planned intra-NoE collaborations • UPM • design of a “metaDB”and implementation of an automatic tool for the generation of the appropriate relational schema between source and destination databases • Custodix? • sensitive data anonymisation where required

  3. Current status and collaborations • Implementation status • Subtyping tool was implemented, web interfaced and is being tested by expertised audience through different mirror sites (beta version) • Recombination tool was implemented, web interfaced and is being tested by expertised audience through different mirror sites (beta version) • In-vitro drug susceptibility tool (genotype/phenotype predictor) was implemented, web interfaced, cross-validated and tested. It’s continuously updated (full operative version) • collection of HIV-1 treatment response clinical records is continuously carried on, through the active participation of local Italian clinical centres, with the data hosted in the ARCA site • In-vivo treatment optimisation: models mined from data through different Statistical Learning techniques • Two papers submitted • Web application still under planning • Batch tools for automated mutation extraction from consensuses with quality assessment on sequences implemented (frameshifts handling, ambiguities) integrated in the data base sequence recording

  4. HIV-1 pol subtypes, recombinant and URFs tools and repository /1 • Objectives • Assign an input pol sequence to one of HIV subtypes or circulating recombinant forms (CRFs) • In case of insufficient similarity to any CRF, evaluate its mosaic structure and compare to other available unique recombinant forms (URFs; note that a new CRF is defined when the same URF is repeatedly found in at least 3 non-linked subjects) • Features • Fast subtyping using BLAST on consensus sequences + CRFs • BLAST search for new subtypes, CRFs or URFs on • ARCA pol DB • Los Alamos HIV-1 pol data base • GenBank • Real Time analyses, wrapped enviroment

  5. HIV-1 pol subtypes, recombinant and URFs tools and repository /2 • Features (cont.) • Phylogenetic Analysis of HIV-1 pol sequences: • Multiple alignment (CLUSTALW) with consensus sequences • Neighbour-Joining tree (100 bootstrap) • Maximum Likelihood tree (with phylogenetic signal detection through TREEPUZZLE) • Bootscan for recombination detection • Graphical exportable output (txt, pdf, jpg) of trees, phylogenetic signals and recombination plots • Analyses carried in real time through a user friendly interface

  6. HIV-1 pol subtypes, recombinant and URFs tools and repository /3 • HIV-1 subtyping tool home page • BLAST • PHYLIP • TreePuzzle

  7. HIV-1 pol subtypes, recombinant and URFs tools and repository /4 • BLAST analysis on different data bases • Fast subtyping • URF detection • CRF detection

  8. HIV-1 pol subtypes, recombinant and URFs tools and repository /5 • Phylogenetic analysis • PHYLIP • Quality assessment with bootstrap • TreePuzzle • Quartet puzzling • Easy portable output formats

  9. HIV-1 pol subtypes, recombinant and URFs tools and repository /6 • Recombinant detection tool • Sliding window sequence scan

  10. HIV-1 in-vitro drug susceptibility prediction /1 • Objectives • Predict HIV reverse transcriptase and protease drug susceptibility in vitro - expressed as fold-resistance (IC50 of the input isolate divided by IC50 of the reference wild type isolate) - based on an input HIV pol sequence • Features • Batch procedure implemented in the data base sequence recording: • Alignment with wild type consensus reference sequence (global and local) • Frameshifts, stop codons and ambiguous positions handling • Extraction of mutations from wild type

  11. HIV-1 in-vitro drug susceptibility prediction /2 • Features (cont.) • Genotype/phenotype prediction • Updated with the newest drugs • Training and validation on large data sets (Stanford public and our own clinical practice, usually >=1000 genotype/phenotype pairs for each drug) • Linear Regression Model (simple, robust and still challenging with complex models like SVMs)

  12. HIV-1 in-vitro drug susceptibility prediction /3 • Features (cont.) • Genotype/phenotype prediction • Statistical analyses to find relevant mutations (univariate chi-squared, multivariate) • Stepwise, Lasso, Ridge shrinkage feature selection functions explored • Final model has few variables and maintains highest predictive power (with 10-fold cross validation) • Real valued Fold Change prediction, summary of biological and clinical cutoffs

  13. HIV-1 in-vitro drug susceptibility prediction /4 • MLR genotype/phenotype currently hosted in the ARCA site

  14. HIV treatment response repository with treatment optimisation tool • Objectives • Collect clinical records – from clinical trials or observational cohorts - including HIV genotype and viral load as well as patient CD4 counts at the time of a treatment switch or initiation coupled with follow-up data during continued therapy with that regimen • Features • Query sequence handling as in geno/pheno prediction • “Quasi” Clinical-Trial view on ARCA db • Collection of treatment change episodes with • Baseline RNA, CD4, genotype, HAART… • 12-weeks follow up • Comparison of user query with the view, collection of similar cases • Follow up prediction for the query • Case Based Reasoning, K-nearest neighbour algorithm, kernel methods, local optimization, feature selection • Current status • Two papers submitted • Implementation under planning

  15. Integration of HIV-1 databases from local hospital centres /1 • Objectives • Realise a web-based or downloadable wizard-like tool to facilitate export of data between a user’s database and the internal database used to implement the functions available on the HIV pol analysis portal • Define a standard (possibly based on ARCA) • Type of data to be handled • HIV pol sequences likely to be recombinants, particularly if accompanying information is available (e. g. country of origin, year of detection). • HIV pol sequences matched with in vitro drug susceptibility data measured by a published or documentable assay • HIV pol sequences coupled with treatment used and follow-up data (HIV RNA load and/or CD4 counts)

  16. Integration of HIV-1 databases from local hospital centres /2 • Features • Automatically handle: • Heterogeneity conflicts • Semantic conflicts • Descriptive conflicts • Structural conflicts • Open problems • Heterogeneity of data sources • Different schemas, technology… • Heterogeneous data • Need to maintain local autonomy and preferences • Ethical and security issues • Privacy, security • Anonymization of sensitive data

  17. Integration of HIV-1 databases from local hospital centres /3 • Current status • Meeting held in Madrid (June ’06) with UPM • Discussion on-going about: • Federated vs centralized approach • Meta-Data Standards • Source quality

More Related