180 likes | 439 Views
Ion Torrent Semiconductor Sequencing. Mike Lelivelt, Ph.D., Director of Bioinformatics. The content provided herein may relate to products that have not been officially released and is subject to change without notice. Who am I? – Mike Lelivelt.
E N D
Ion Torrent Semiconductor Sequencing Mike Lelivelt, Ph.D., Director of Bioinformatics The content provided herein may relate to products that have notbeen officially released and is subject to change without notice.
Who am I? – Mike Lelivelt • Ph.D. from Univ of N Carolina in Microbial Genetics • Post-Doc at Univ of WI Madison in Yeast Genomics • 9 years at Affymetrix – software developer outreach • 2 years at Partek – data anaylsis for arrays & NGS • 3 years at Ion Torrent/Life Tech in bioinformatics • Familiar with the challenges of applying genomic scale assays into discrete, actionable decisions via software. • I’m here to educate about semiconductor sequencing. • I’m here to listen to your needs. Confidential and Proprietary—DO NOT DUPLICATE
Opening thoughts… • "The wonderful thing about standards is that there are so many of them to choose from." –Andrew Tanenbaum • Driven more by the technology that we’d like to admit • Each technology platform serves multiple applications • A data standard implies a file format, but it’s really more about understanding data process flows • Broad scope of NGS will drive multi-marker haplotypes and introduces allele frequency measurements into the decision process. • Software is a tough business model. We’ll need to work together on this. Confidential and Proprietary—DO NOT DUPLICATE
Simple Natural Chemistry Eliminate source error: • Modified bases • Fluorescent bases • Laser detection Eliminate read length limitations: • Unnatural bases • Protect/de-protect • Slow cycle time H+ Sequence is determined by measuring hydrogen ions released (1 per base added per DNA strand) during 2nd strand synthesis when complementary base (A, C, G or T) are sequentially incorporated by DNA polymerase.
Torrent Browser runs on Torrent Server Local compute and storage with an integrated web interface • Torrent Server – hardware appliance • Torrent Browser – easy web access to Ion data • Plugins for secondary analysis e.g. variant calling For Research Use Only. Not intended for any animal or human therapeutic or diagnostic use.
Data Flow Leverages Several Formats Incorporation for 1 Flow (DAT) Incorporation over many flows (DAT) Raw signals per flow (WELLS) 0.1 1.2 0.3 2.1 0.1 0.2 2.1 3.1 0.0 0.2 2.1 3.1 0.0 0.1 1.2 0.3 2.1 0.0 0.0 0.0 3.2 1.4 0.1 1.3 1.0 0.2 0.1 Processed incorporations (SFF), but moving to unmapped BAM Flow space converted to base space (FASTQ) @7D8NM:4:9 GGGATCAGGCTGTCGAACGCGTGATTACATCTAGCTA + AA*ABBBB?BBBBBBBABBB@@@BB?BABABCDA!@$ 0 1 0 2 0 0 3 0 0 1 0 4 0 1 0 3 0 2 2 0 0 1 0 0 0 3 0 4 0 1 1 0 2 0 0 3 0 0 3 0 4 0 4 0 1 0 3 0 2 0 0 0 1 1 TMAP ##FORMAT=<ID=DP,Number=1 ##FORMAT=<ID=HQ,Number=2 #CHROM POS ID REF ALT QUAL FILTER 20 14370 rs6054257 G A 29 PASS binary TVC Variant Call Format (VCF) BAM 7 7
What is raw data? Do you really want it? Process Description File Type 314 chip 316 chip 318 chip Raw Voltage Data DAT 40 GB 180 GB 320 GB Signal Processing WELLS 1 GB 8 GB 12 GB Base Calls - Flow SSF/BAM 1 GB 5 GB 8 GB Base Calls - Base FASTQ 0.3 GB 1.5 GB 2 GB Base Calls - Aligned BAM 0.1 GB 0.6 GB 3.5 GB 8 *1.5 v run 200bp runs (440 flows, 110 cycles), Nov 2011
Questions to Address • Are allele calls alone sufficient to call HLA types? • Likely not. More data is usually better. • Should HLA software be required to call novel alleles? • Speak no evil. See no evil. Hear no evil. • But software will serve the market. • Should novel alleles be submitted to IMGT/HLA? • Balance between social curation & data security. More than just allele info? • How should data be formatted to handle NGS richness? • Format is a snapshot in time. Confidential and Proprietary—DO NOT DUPLICATE
All products mentioned in this presentation are for Research Use Only, not intended for any animal or human therapeutic or diagnostic use. Confidential and Proprietary—DO NOT DUPLICATE