100 likes | 353 Views
CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services. Semiconductor DNA Sequencing. Ion Proton. Ion Torrent. “Sequencing on a Chip”. Semiconductor Sequencing in a Nutshell. “It’s a computational pH meter”. Metagenomics. Environmental samples of communities of organisms
E N D
CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services
Semiconductor DNA Sequencing Ion Proton Ion Torrent “Sequencing on a Chip”
Semiconductor Sequencing in a Nutshell “It’s a computational pH meter”
Metagenomics • Environmental samples of communities of organisms • water, soil samples • human & animal microbiomes • mine tailings, oil spills • deep sea, polar ice • etc. etc.
Metagenomics Pipeline CSU Cray supercomputer; Oak Ridge Titan supercomputer Torrent/Proton sequencers Megan NCBI nucleotide databases
Metagenomics Tools • Ion Proton Sequencer • In: Sample DNA • Out: 50M DNA fragments • NCBI nucleotide database • DNA fragments • 15M+ records • Do the math: • 50M * 15M = 1014 queries • mpiBLAST • Highly parallelized Blast algorithm • NGS sample DNA • Query NCBI DB • CSU Cray XT6m • 2,016 CPU cores
Metagenomics • Dr. Toni Piaggio, National Wildlife Research Center, Fort Collins • Florida Everglades water samples (4) • “What species are in the water?” • CSU NextGen Sequencing Core: Ion Proton; 2 weeks • CSU Cray: 1,000 cores, 24-hours, 4 runs; 1 week • Results
Metagenomics • Rarefaction curves • Estimate species richness • Asymptotic? • Find rare species
Computational Resources Strong scaling • Oak Ridge Titan Cray XK7 Supercomputer • 300K CPU cores; 50M GPU cores • mpiBlast • NCBI nucleotide DB • Query 100% of sample DNA • CSU Cray XT6m Supercomputer • 2,016 CPU cores • mpiBlast • NCBI nucleotide DB • Query 1% of sample DNA
Summary • Big Data Issues • Semiconductor sequencer data • Large-scale database queries • High-performance computing