270 likes | 389 Views
Next Generation Sequencing in Virus and Parasite Research. Four main projects In the lab. Applications Presented. Sanger Read. GS-FLX read. 100Mb | 500Mb per run. ~250bp. 500 bp. >800bp. WGS. Annotation. Population Diversity. Pathogen Discovery. Total scaffolds: ~8250
E N D
Four main projects In the lab Applications Presented Sanger Read GS-FLX read 100Mb | 500Mb per run ~250bp 500 bp >800bp WGS Annotation Population Diversity Pathogen Discovery
Total scaffolds: ~8250 Longest scaffold: 6.5 Mb Total bases in scaffolds: 71 Mb Total span of scaffolds: 80 Mb Brugia malayi Genome ProjectParasitic nematode, causes lymphatic filariasis Genome size ~100Mb Sanger (cloning bias) 6 chromosomes in 8250 pieces
Brugia malayi Genome ProjectPHASE II – Use Next-Gen Data Closing the Genome Curating the Data Next-generation sequencing Fingerprint maps Mapping 5’ and 3’UTRs Functional annotation DATABASE Re-assemble genome Re-annotate (Confirm UTRs by GSFLX) (Hybrid Sanger-GSFLX assembly)
Mix of random reads and paired reads Avg read length: ~220bp GS-FLX Sequencing of WormgDNA and cDNA Whole Plate 4-well gasket Paired-Ends and WGS UTRs 5’UTR 3’UTR gDNA SL ~100 Mb 5 runs= 5X coverage of the genome
SEQUENCE ASSEMBLY Mapping of paired and non-paired reads onto genomic assembly 20Mb of Brugia reads = ~0.25X coverage hits 100% | | 80% Paired-ends No apparent Bias
mRNA AAAA RNA oligo MmeI site AAAA P CIP TAP RNA ligase RT-PCR AAAA NlaIII Unique sequence SAGE Tag Concatenated SAGE Tags DITAGS Sequencing UTRs of B. malayi (variable length)
Sequencing Results One sequence run 5’UTR SL 3’UTR ~50Mb of data in ~400,000 reads
Data processing Raw Data Remove Linker, Small tags(<10), Identical, Junk Blast against Genome EST CDS Exon Unmatched tags Blast against Small contigs Mitochondrion Bacterial singletons
Mapping of Tags EST 3’-tag SL-tag 5’-tag 40S ribosomal protein S18
Intra-Host Diversity of Influenza A Virus Drug resistant and Sensitive variants Antigenic variants
Mapped GS-FLX Sequence Readson antigenic domain of Hemagglutinin 566aa 1,757nt HA1 HA2 450bp Amplicons:
Mapped Translated GS-FLX Reads on Epitopes of HA1 Domain A B E D D B D D E C
Patterns:Non-Synonymous mutations are predominantly in epitope regions(13/19 sites) A D A A A B B #reads 2 3 122 1 122 1 2
Identifying rare variants:Drug resistance mutation Matrix segment in H1N1 isolate 4 137 4 2 1 171 78 1 1 1 1 4 1 1 1 35 #reads Resistant H1N1 1/437=0.2% agt (S) aat (N) N31S
SNP Analyses: Probability that Polymorphism is Real Base# A C G N T GAP SNP probability pbShort (polybayes) - Marth Lab, Boston College
Signal Processing: Length Distributionadjusting the stringency of quality filters 75,000 – avg ln 200 70,000 – avg ln 195 Changes length distribution Reads slightly shorter BUT Average quality is higher Higher stringency Default Read length
Signal Processing: Quality Distribution Default Reduce the # of bases BUT Increase the proportion of bases of HIGH QUALITY Higher stringency 15 Million bp 14 Million bp Quality Score
Whole Virus Genome Sequencing Limitation of read length BUT: • Isolate single genome (limited dilution, other?) • Random prime or specific primers with barcodes • use barcode to amplify • Multiplex: 20 barcodes, 16-well gasket = 320 samples
Virus Genomic Library Construction- Discovery - NNNN Reverse transcription RNA 1a RT cDNA or ssDNA NNNN NNNN NNNN NNNN DNA extension from random primers 1b Klenow Exo-DNA polymerase NNNN NNNN dsDNA NNNN NNNN Amplification from tags NNNN NNNN 2 PCR Select 500 bp amplicons for emulsion PCR and pyrosequencing Size selection & Sequencing 3
Post-Processing Pipeline Barcodes mapped onto reads NUCMER MySQL db Reads clustered and reduced to a unique set BLASTN BLASTX
26,750 contigs BLASTN 56% match human DNA 12, 889 contigs BLASTX 120 match viruses
Oral Microbiome Project TagA TagB TagC TagD BU128 WV409 BK026 BR095 BU128 WV409 BK026 BR095 WV001 WV213 BK044 BU130 WV001 WV213 BK044 BU130 BR009 WV597 WV631 BU133 BR009 WV597 WV631 BU133 BR023 WV041 BU137 WV628 BR023 WV041 BU137 WV628 Family Family Family Family VIRAL VIRAL BACTERIAL BACTERIAL BACTERIAL VIRAL VIRAL BACTERIAL 1 5 2 6 3 7 4 8 Pool HIGH LOW HIGH LOW Periodontal Disease Caries
Sequencing of PCR Amplicons 250bp in size Bacterial Diversity Heat Maps: Sequencing of 16S rRNA variable region
Acknowledgments Ghedin Lab School of Medicine Jay DePasse Adam Fitch Xu Zhang Funding: NIDCR/NIH CTSI JDRF Burroughs-Wellcome Fund School of Dental Medicine Mary Marazita Graduate School of Public health Robert Ferrell Mike Barmaba GPCL Debby Hollingshead Paul Wood Janette Lamb