320 likes | 479 Views
Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization. PD: Ion M ă ndoiu , UConn Co-PDs: Mazhar Khan, UConn Rachel O’Neill, UConn Alex Zelikovsky , GSU . Outline. Background & aims of the project
E N D
Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion Măndoiu, UConn Co-PDs: Mazhar Khan, UConn Rachel O’Neill, UConn Alex Zelikovsky, GSU
Outline • Background & aims of the project • Bioinformatics tools for quasispecies spectrum reconstruction from NGS reads • Experimental validation on IBV data • Summary and ongoing work
Infectious Bronchitis Virus (IBV) • Group 3 coronavirus • Biggest single cause of economic loss in US poultry farms • Young chickens: coughing, tracheal rales, dyspnea • Broiler chickens: reduced growth rate • Layers: egg production drops 5-50%, thin-shelled, watery albumin • Worldwide distribution, with dozens of serotypes in circulation • Co-infection with multiple serotypes is not uncommon, creating conditions for recombination IBV-infected egg defects IBV-infected embryo normal embryo
IBV Vaccination • Broadly used, most commonly with attenuated live vaccine • Short lived protection • Layers need to be re-vaccinated multiple times during their lifespan • Vaccines might undergo selection in vivo and regain virulence [Hilt, Jackwood, and McKinley 2008]
RNA Virus Replication High mutation rate (~10-4) Lauring & Andino, PLoS Pathogens 2011
Evolution of IBV Quasispecies identified by cloning and Sanger sequencing in both IBV infected poultry and commercial vaccines [Jackwood, Hilt, and Callison 2003; Hilt, Jackwood, and McKinley 2008]
How Are Quasispecies Contributing to Virus Persistence and Evolution? • Variants differ in • Virulence • Ability to escape immune response • Resistance to antiviral therapies • Tissue tropism Lauring & Andino, PLoS Pathogens 2011
Project Aims • Develop bioinformatics tools for accurate reconstruction of quasispecies sequences and their frequencies from next-generation reads • Study quasispecies persistence and evolution of IBV in commercial layer flocks following vaccination • Use results of this study to optimize vaccine development and vaccination protocols
Outline • Background & aims of the project • Bioinformatics tools for quasispecies spectrum reconstruction from NGS reads • Experimental validation on IBV data • Summary and ongoing work
Next Generation Sequencing Illumina HiSeq 2000 up to 6 billion PE reads/run 35-100bp read length Roche/454 FLX Titanium 400-600 million reads/run Length up to 1,000 bp http://www.economist.com/node/16349358 Ion Torrent PGM 1-10M reads/run length up to 400bp SOLiD 4/5500 1.4-2.4 billion PE reads/run 35-50bp read length
Shotgun vs. Amplicon Reads • Shotgun reads • starting positions distributed ~uniformly • Amplicon reads • reads have predefined start/end positions covering fixed overlapping windows
Reconstruction from Shotgun Reads: ViSpA Read Error Correction Read Alignment Preprocessing of Aligned Reads Shotgun reads Frequency Estimation Read Graph Construction Contig Assembly Quasispecies sequences w/ frequencies • User Specified Parameters: • (A) Number of mismatches • (B) Mutation rate
Reconstruction from Amplicon Reads: VirA Error-correctedSAM/BAM Read data Amplicon Read Graph Estimate Amplicons Reference in FASTAformat Viral population variants with frequencies Max-Bandwidth Paths Frequency Estimation
Amplicon Sequencing Challenges • Multiple reads from consecutive amplicons may match over their overlap • Distinct quasispecies may be indistinguishable in an amplicon interval
Outline • Background & aims of the project • Bioinformatics tools for quasispecies spectrum reconstruction from NGS reads • Experimental validation on IBV data • Summary and ongoing work
IBV Genome Rev. Bras. Cienc. Avic. vol.12 no.2 Campinas Apr./June 2010 RT-PCR of S1 using redesigned primers
53 plasmid clones M42 Sample 10 clone pool Experiment 1 C1 20% C2 20% C3 15% C4 15% C5 10% C6 10% C7 4% C8 4% C9 1% C10 1% … 454 reads 454 reads Assembled quasispecies V1 V2 V3 … Vn Assembled quasispecies PV1 PV2 PV3 … PVk … …
How well we predicted sanger clones How well our prediction is
Neighbor-Joining Tree for Sanger clones and ViSpA Reconstructed Sequences
Outline • Background & aims of the project • Bioinformatics tools for quasispecies spectrum reconstruction from NGS reads • Experimental validation on IBV data • Summary and ongoing work
Summary • Developed software tools for quasispecies reconstruction from both shotgun and amplicon next-generation reads • Code and executables freely available at http://alla.cs.gsu.edu/~software/VISPA/vispa.htmlhttp://alan.cs.gsu.edu/vira/ • ViSpAplugin developed for users of ION Torrent, available on ION community • Experimental results on both simulated and real data show improved accuracy tradeoffs compared to previous methods • Tools are applicable to quasispecies studies of other viruses
Ongoing Work • Deployment of ViSpA and VirA on Galaxy servers maintained at UConn and GSU • Tool validation on ION Torrent reads • Comparison of shotgun and amplicon based reconstruction methods • Combining long and short read technologies • Quasispecies persistence studies using longitudinal sampling
Tool Validation for ION Torrent reads • Shotgun IBV reads generated using 316 ION chip • 2,384,007 reads (1,177,740 after SAET correction) • mean length 203.58 bp • ViSpA results • 23 quasispecies with estimated frequency > .5%, 2,200 total
Longitudinal Sampling Amplicon / shotgun sequencing
Contributors Bassam Tork Ekaterina Nenastyeva Alex Artyomenko Serghei Mangul Nicholas Mancuso Alexander Zelikovsky University of Connecticut: Rachel O’Neal, PhD. MazharKahn, Ph.D. Hongjun Wang, Ph.D. Craig Obergfell Andrew Bligh University of Maryland Irina Astrovskaya, Ph.D.