190 likes | 304 Views
Clinical grade next-generation sequencing of UM. Elisha Roberson, Ph.D. Depts. of Internal Medicine and Genetics Washington University in St. Louis. All talk content can be tweeted and blogged. @thatdnaguy. Sanger ≠ next-gen sequencing. Sanger Consensus of a population of molecules
E N D
Clinical grade next-generation sequencing of UM Elisha Roberson, Ph.D. Depts. of Internal Medicine and Genetics Washington University in St. Louis
All talk content can be tweeted and blogged @thatdnaguy
Sanger ≠ next-gen sequencing • Sanger • Consensus of a population of molecules • ~600-800bp sequence • Very low error rate • Low-throughput • Targeted • Expensive* (256 bp/$) • NGS • Single-molecules • 35 bp – 3+ kb • High error rates (1-15+%) • High-throughput • Can shotgun • Cheap* (11.5 Mbp/$) *Our lab’s current Sanger & HiSeq2500 costs
UM Clinical sequencing • CLIA/CAP • Detect actionable somatic variants • Insurance pre-approval • Submit DNA • Wait a long time • Pay $7000+ / sample • Research (discovery, epidemiology, etc) • IRB approved DNA collection • You sequence and interpret
My sequencing wishlist • Tissue • Fresh tissue • Large amount of high-quality DNA • No PCR amplification • Sequencing (whole genome) • Low-error, long reads • Paired-end • 30X or greater germline coverage • 60X or greater tumor coverage • Bioinformatics • De novo assembly of both germline and tumor, compare to reference & each other • Genotyping algorithm that is pair-aware
NGS technologies • ABI – SOLiD • Emulsion PCR by di-base ligation • Illumina – Solexa • Single-molecule fluor sequencing • Life Tech - Ion Torrent • Single-molecule semiconductor sequencing • PacBio – SMRT • Zero-mode wave guide with single polymerase in-well
Where should I get DNA??? • Germline • NO blood • Sequencing single molecules!!! • Spit kits, washed skin biopsies • Tumor tissue • Fresh primary tumor • Fresh metastatic tumor • FFPE?
Calculating sequencing depth • NGS technology dependent • How deeply do you want to see mosaicism? • General guidelines • 30X germline • 60X or higher tumor • BUT sequencing follows Poisson distribution • i.e. 30X average coverage != all targets 30X
Coverage also varies by tissue* *Plot available on Figshare
FASTQ Preprocessing • Demultiplex samples • Discard no index • Convert to PHRED quality scale • -10 x log10( probability of base error ) • Remove adapter contamination • cutadapt • Trim low-quality trailing bases and 3’ Ns • No 5’ trimming!!! • Run FASTQC!
Alignment to reference • Current human is GRCh37 • Repeat masking • Hard mask repeats are N • Soft mask repeats are lowercase • Prefer soft masking • Ref aligners have generally low memory use • Mostly use Burrows-Wheeler transform • bowtie 1 & 2 • bwa-aln & bwa-mem • Novoalign (high memory)
De novo assembly • Most De Bruijn graphs with kmers of sequence • Mostly very high memory usage • Depends on depth and number of kmers • Try running diginorm first (C. Titus Brown) • Aligners • ABySS • MIRA • SOAPdenovo • Velvet
Post-processing • Picard tools • Convert to BAM format • Add read-group tags • Mark duplicates (Picard tools) • Genome Analysis ToolKit (GATK) • Local realignment of indels • Base quality score recalibration
Genotyping • General • Genome analysis toolkit • Unified Genotyper • Samtools • Mpileup • Somatic specific • MuTect • Somatic Sniper • VirMiD
Variant filtering strategies –sequential evolution Mutation with metastatic advantage leaves the eye Mosaic primary Initiating event Metastasis has primary mutations & Metastatic mutation (maybe in primary?) & New mutations
Variant filtering strategies –parallel evolution Mosaic primary Very little overlap in mutations between primary and met! Initiating event Metastasis has few primary mutations & Metastatic mutation (not in primary) & New mutations
Variant confirmation • Sanger sequencing • Fluidigm • TaqMan • castPCR • Sequenom • Illumina Golden Gate Discover in a focused set with sequencing Type with these technologies in everything