1 / 21

VirVarSeq vs ViVaMBC

Pictured above: The structure of HIV. VirVarSeq vs ViVaMBC. Bie Verbist | NCS Brugge | 10-10-2014. Statistical methods for improved variant calling of massively parallel sequencing data. OUTLINE Viral dynamics Massive parallel sequencing Variant calling VirVarSeq ViVaMBC Results

Download Presentation

VirVarSeq vs ViVaMBC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pictured above: The structure of HIV. • VirVarSeq vs ViVaMBC • BieVerbist | NCS Brugge | 10-10-2014 • Statistical methods for improved variant calling of massively parallel sequencing data.

  2. OUTLINE • Viral dynamics • Massive parallel sequencing • Variant calling • VirVarSeq • ViVaMBC • Results • HCV plasmids • HCV clinical sample

  3. Viral dynamics A virus is a small infectious agent that replicates only inside the living cells of other organisms. • High replication rate (1011 replications a day for HIV) • High mutation rate Viral population consist of closely related subgroups, viral quasispecies, which we want to identify and quantify.

  4. Viral dynamics Drug-sensitive variants Drug-resistant variant Number of virusus in population Heterogeneous viral population Undetectable Time Before treatment On treatment

  5. Sequencing Sanger sequencing • detection limit: 20-30% • no accurate estimate of frequency Massively parallel sequencing ACGGTTTCCGTCTGGG ACGGTTTCTGTCTGGG ACGGTTTCCGTCTGGG ACGGTTTCTGTCTGGG ACGGTTTCTGTCTGGG ACGGTTTCTGTCTGGG ACGGTTTCTGTCTGGG ACGATTTCTGTCTGGG • detection limit << 20% • more accurate estimate of frequency

  6. Massively parallel sequencing Fragmentation Amplification Viral population DNA Fragments Sequencing by synthesis Example, one fragment: A A G T G C A C G T T T A C G T C

  7. Massively parallel sequencing Viral population @HWUSI-EAS1524:17:FC:1:120:19254:21417 1:N:0:GATCAG GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAA + G@GG@GG@GGHHHBH>GEGDGGBGEGG?GGHHHH>GEGBG@?BEF?DBB<GDGGGGFGG3GGEBA>EC:; @HWUSI-EAS1524:17:FC:1:120:9430:21420 1:N:0:GATCAG ATCGGAAGAGCACACGNCTGAACTCCAGTCACGATCAGATCCCGTATGCCGTCTTCTGCTTGAAAAAAAA + DDDDDDDDDD2DDDDD#DDDDDDDDDDDDDDDDDDDDDDD2:8:7;<@>;DDDDDDDDDDD:DDDDD### @HWUSI-EAS1524:17:FC:1:120:12760:21420 1:N:0:GATCAG ATCATACTGTCTTACTNTGATAAAACCTCCAATTCCCCCTANCATTNTTGGTTNCCATCTTCCTTGCAAA + HHHHHHHHHHHHHHHG#GGGFFFF@HHHHHHGHHHHHHHHF#FFEB#BBBA>B#BFFFFFHHHHHHHHHG

  8. Variant calling Distinguish low-frequency variants from sequencing error. VirVarSeq ViVaMBC Adaptive filtering approach based on quality scores. Verbist et al. 2014, Bioinformatics. doi: 10.1093/ bioinformatics/btu587. Model based clustering approach which models the error probabilities based on quality scores. Verbist et al. 2014, BMC bioinformatics. under revision.

  9. VirVarSeq Extract reads that cover codon of interest Filter based on the quality scores. Build a codon table ... ... ... ... Reference Reads ... CGA CGA CCA CGA CGA CGA CCA CCA CGT CGA CGA GGA CGA CGA CCA CGA CGA CGA CCA CCA CGT CGA CGA GGA ... ... ... ... ... Codon Table Filtering ... ... ... ... ... ... ... ... ... ... * codon = nucleotide triplets which specifies a single amino acid

  10. VirVarSeq QIT Image or graphic goes here Definition of the Q-threshold (QIT) : Fit mixture distribution on Q-scores with 3 components: • Point prob around Q 2 • Error distribution • Reliable call distribution Intersection point is threshold.

  11. ViVaMBC Extract reads that cover codon of interest Perform Model Based Clustering Model the error probability Clusters unknown, EM algorithm ... ... Reference Reads ... CGA CGA CCA CGA CGA CGA CCA CCA CGT CGA CGA GGA ... ... ... CCA ... ... CCA GGA Codon Table ... Clustering CCA ... CGT ... ... CGA ... ... CGA CGA ... CGA CGA ... CGA CGA ... Cluster medoids = variant Size of Cluster = Frequency N° Clusters = N° variants ...

  12. Results – HCV plasmids Two plasmids • Amino acids 1 to 181 of NS3 region • differ at two codon positions (36 and 155) • mixed 4 different proportions

  13. Results – HCV plasmids Other variants (11481 max) are false positives. VirVarSeq reports: • more false positives • with frequencies going up to 0,5%

  14. Results - HCV clinical sample Image or graphic goes here VirVarSeq ViVaMBC VirVarSeq reports more variants. Above 1% methods in agreement, even above 0.5%. Few false pos in GC region for ViVaMBC ?

  15. VirVarSeq vs ViVaMBC When applying reporting limits of 1% or 0.5%, methods are in agreement. Below this limit, trade-off between sensitivity and specificity, with VirVarSeq less specific. VirVarSeq Adaptive approach Easy development Runs fast ViVaMBC • More elegant • Longer development time • Longer run time

  16. Acknowledgements Promoters: Prof.Dr.O.Thas1, Prof.Dr.L.Clement1 and Prof.Dr.L.Bijnens2 Yves Wetzels, Tobias Verbeke, Joris Meys1 for IT support Scientists within discovery sciences2 Non-clinical statistics team2 2 2 1

  17. bverbis2@its.jnj.com • 10-10-2014 • Thank you

  18. Back-up

  19. ViVaMBC • Notation: • ri: best base calls of read i (i=1 ... n) • si: second best base calls of read i (i=1 ... n) • zij: zij=1 when read i belongs to haplotype j (j=1...k) • τj: probability to belong to haplotype j • Complete Data Likelihood:

  20. ViVaMBC • Complete Data Likelihood: Likelihood depends on cluster membership zij  EM algorithm

  21. Library preparation Sequencing by synthesis

More Related