1 / 27

Next Generation Sequencing in Virus and Parasite Research

Next Generation Sequencing in Virus and Parasite Research. Four main projects In the lab. Applications Presented. Sanger Read. GS-FLX read. 100Mb | 500Mb per run. ~250bp. 500 bp. >800bp. WGS. Annotation. Population Diversity. Pathogen Discovery. Total scaffolds: ~8250

zoe-bowen
Download Presentation

Next Generation Sequencing in Virus and Parasite Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Next Generation Sequencing in Virus and Parasite Research

  2. Four main projects In the lab Applications Presented Sanger Read GS-FLX read 100Mb | 500Mb per run ~250bp 500 bp >800bp WGS Annotation Population Diversity Pathogen Discovery

  3. Total scaffolds: ~8250 Longest scaffold: 6.5 Mb Total bases in scaffolds: 71 Mb Total span of scaffolds: 80 Mb Brugia malayi Genome ProjectParasitic nematode, causes lymphatic filariasis Genome size ~100Mb Sanger (cloning bias) 6 chromosomes in 8250 pieces

  4. Brugia malayi Genome ProjectPHASE II – Use Next-Gen Data Closing the Genome Curating the Data Next-generation sequencing Fingerprint maps Mapping 5’ and 3’UTRs Functional annotation DATABASE Re-assemble genome Re-annotate (Confirm UTRs by GSFLX) (Hybrid Sanger-GSFLX assembly)

  5. Mix of random reads and paired reads Avg read length: ~220bp GS-FLX Sequencing of WormgDNA and cDNA Whole Plate 4-well gasket Paired-Ends and WGS UTRs 5’UTR 3’UTR gDNA SL ~100 Mb 5 runs= 5X coverage of the genome

  6. SEQUENCE ASSEMBLY Mapping of paired and non-paired reads onto genomic assembly 20Mb of Brugia reads = ~0.25X coverage hits 100% | | 80% Paired-ends No apparent Bias

  7. mRNA AAAA RNA oligo MmeI site AAAA P CIP TAP RNA ligase RT-PCR AAAA NlaIII Unique sequence SAGE Tag Concatenated SAGE Tags DITAGS Sequencing UTRs of B. malayi (variable length)

  8. Sequencing Results One sequence run 5’UTR SL 3’UTR ~50Mb of data in ~400,000 reads

  9. Data processing Raw Data Remove Linker, Small tags(<10), Identical, Junk Blast against Genome EST CDS Exon Unmatched tags Blast against Small contigs Mitochondrion Bacterial singletons

  10. Mapping of Tags EST 3’-tag SL-tag 5’-tag 40S ribosomal protein S18

  11. Intra-Host Diversity of Influenza A Virus Drug resistant and Sensitive variants Antigenic variants

  12. Mapped GS-FLX Sequence Readson antigenic domain of Hemagglutinin 566aa 1,757nt HA1 HA2 450bp Amplicons:

  13. Mapped Translated GS-FLX Reads on Epitopes of HA1 Domain A B E D D B D D E C

  14. Patterns:Non-Synonymous mutations are predominantly in epitope regions(13/19 sites) A D A A A B B #reads 2 3 122 1 122 1 2

  15. Identifying rare variants:Drug resistance mutation Matrix segment in H1N1 isolate 4 137 4 2 1 171 78 1 1 1 1 4 1 1 1 35 #reads Resistant H1N1 1/437=0.2% agt (S)  aat (N) N31S

  16. SNP Analyses: Probability that Polymorphism is Real Base# A C G N T GAP SNP probability pbShort (polybayes) - Marth Lab, Boston College

  17. Error Correction(homopolymer tracks)

  18. Signal Processing: Length Distributionadjusting the stringency of quality filters 75,000 – avg ln 200 70,000 – avg ln 195 Changes length distribution Reads slightly shorter BUT Average quality is higher Higher stringency Default Read length

  19. Signal Processing: Quality Distribution Default Reduce the # of bases BUT Increase the proportion of bases of HIGH QUALITY Higher stringency 15 Million bp 14 Million bp Quality Score

  20. Whole Virus Genome Sequencing Limitation of read length BUT: • Isolate single genome (limited dilution, other?) • Random prime or specific primers with barcodes • use barcode to amplify • Multiplex: 20 barcodes, 16-well gasket = 320 samples

  21. Virus Genomic Library Construction- Discovery - NNNN Reverse transcription RNA 1a RT cDNA or ssDNA NNNN NNNN NNNN NNNN DNA extension from random primers 1b Klenow Exo-DNA polymerase NNNN NNNN dsDNA NNNN NNNN Amplification from tags NNNN NNNN 2 PCR Select 500 bp amplicons for emulsion PCR and pyrosequencing Size selection & Sequencing 3

  22. Multiplexing by Barcoding Pools

  23. Post-Processing Pipeline Barcodes mapped onto reads NUCMER MySQL db Reads clustered and reduced to a unique set BLASTN BLASTX

  24. 26,750 contigs  BLASTN  56% match human DNA 12, 889 contigs  BLASTX  120 match viruses

  25. Oral Microbiome Project TagA TagB TagC TagD BU128 WV409 BK026 BR095 BU128 WV409 BK026 BR095 WV001 WV213 BK044 BU130 WV001 WV213 BK044 BU130 BR009 WV597 WV631 BU133 BR009 WV597 WV631 BU133 BR023 WV041 BU137 WV628 BR023 WV041 BU137 WV628 Family Family Family Family VIRAL VIRAL BACTERIAL BACTERIAL BACTERIAL VIRAL VIRAL BACTERIAL 1 5 2 6 3 7 4 8 Pool HIGH LOW HIGH LOW Periodontal Disease Caries

  26. Sequencing of PCR Amplicons 250bp in size Bacterial Diversity Heat Maps: Sequencing of 16S rRNA variable region

  27. Acknowledgments Ghedin Lab School of Medicine Jay DePasse Adam Fitch Xu Zhang Funding: NIDCR/NIH CTSI JDRF Burroughs-Wellcome Fund School of Dental Medicine Mary Marazita Graduate School of Public health Robert Ferrell Mike Barmaba GPCL Debby Hollingshead Paul Wood Janette Lamb

More Related