200 likes | 293 Views
Estimating the frequency of superinfection in a large European collaborative HIV database. István Bartha 1 , M. Assel, P. Sloot, M. Zazzi, C. Torti, E. Schülter, A. De Luca, A. Sönnerborg, A.B. Abecasis, A.-M. Vandamme, R. Paredes, D. van de Vijver, V. Müller 1
E N D
Estimating the frequency of superinfection in a large European collaborative HIV database István Bartha1, M. Assel, P. Sloot, M. Zazzi, C. Torti, E. Schülter, A. De Luca, A. Sönnerborg, A.B. Abecasis, A.-M. Vandamme, R. Paredes, D. van de Vijver, V. Müller1 1Eötvös Loránd University, Budapest
Background • Infection of an HIV+ individual - superinfection • Transmission of drug resistant strain, accelerate disease progression • Few studies on the prevalence of superinfection • No routine testing Is it possible to detect superinfection from routine genotypic data?
Data • Virolab and EuResist collaborative HIV databases (Italy, Spain, Belgium, Sweden, Germany ) • At least 2 sequences from each patient • Most of the patients are treated. 4656 patients
Data - sequences • Sequences of RT and PR regions: 14196 sequences, average 1kb long
Definition of superinfection • Last common ancestor of a patient’s sequences is not in the patient
Problems • Build a reliable phylogenetic tree in feasible time rough trees by RAxML 4656 patients 303 patients MrBayes results • 1-1 month on a 4500-core cluster
Phase I. - Maximum Likelihood trees • Maximum Likelihood trees with RAxML v.7.2.6 on 14196 sequences starting from randomized Maximum Parsimony trees • Initial starting tree affects the final results -> repeated 100 times • Fast & rough
Phase II. Bayesian trees with MrBayes • Each patient in Phase II was analyzed by MrBayes 3.1.2 (modified by Alexandros Stamatakis) • Provides estimate of reliability (posterior probabilities of the clades)
Phase II - Analysis of Bayesian trees • Remove all branches with weak support • Check whether the patient’s sequences form a monophyletic cluster or fail to do so
Validating the method • How many false negatives? • We analyzed 150 rejected patients by Phase I.
Validating the method • How many false negatives? • We analyzed 150 rejected patients by Phase I.
Descriptive statistics about the selected superinfected patients
Significant effects • Data center has significant effect (p = 0.002792) • Follow-up time has significant effect (p = 0.015) : 6% increase in probability of being superinfected per year of follow-up
Discussion • 68 patient (1.4%) identified • Unknown number of false negative patients • Arbitrary threshold on branch support - independent validation • Should run deep analysis on all patient - computer time restrictionsImprove pre-selection
Acknowledgment Viktor Müller Anna Abecasis Anne-Mieke Vandamme M. Assel, P. Sloot, M. Zazzi, C. Torti, E. Schülter, A. De Luca, A. Sönnerborg, R. Paredes, D. van de Vijver
Detection of superinfection • Overlapping timescales of with-in host and among-host evolution • Unable to differentiate between superinfectious and long-time persisted viral strains - this is a general limitation of any distance based method