Design Goals

Design Goals

Crash Course: Reference-guided Assembly

Sequencing Technologies future

Next-Gen Sequence Lengths

Mixing It Up: Paired-end Reads

How Does It Work?

C. elegans: a case for INDELs SPEED 100 million Illumina reads Alignment time: 93 min (17,800 reads/s) Assembly time: 100 min INDEL validation rate: 89.3 % (216) SNP validation rate: 97.8 % (229) INDELS

P. stipitis: Co-assembly Capillary 454 FLX 454 GS20 Illumina

Scaling Up M. musculus H. sapiens D. melanogaster C. elegans P. stipitis H. sapiens ENCODE region H. sapiens CAPON region M. musculusmtDNA

Performance: Aligners

Aligners: Feature Set

Performance: Aligner

Performance: Aligner Using P. stipitis (15.4 Mbp) 454 FLX data set. 932,565 reads basecalled by PyroBayes†. † Quinlan et al. Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nature Methods (2008)

Accuracy: Synthetic Data Sets 1 million 1 per 1.3 kb 1 per 7.2 kb H. sapiens Xchromosome

Accuracy: Classification

Accuracy: Unique Read Alignment

Reasons to use ? “One tool, many technologies, many applications” • FAST • Accurate • Multiprocessor (OPENMP) • Co-assemblies • Gapped alignments • Widely used

(Near) Future Development • All technologies • Pacific BioSciences • Helicos • All application areas • Adapter trimming • Coverage graphs • Optimization • Improved paired-end read support • File format standardization (SAF & SRF)

1000 Genomes Project • Many samples with light coverage(1000 dg) • 100 samples from 10 populations at 2x coverage • Find 90% of the 1 % frequency variants per population • Trios with moderate coverage (990 dg) • 30 trios at 11x coverage • If you’re looking for SNPs, are your tools and methods robust?

Scaling Up: Disk Footprint • Current situation: files created by MOSAIK are not optimized for speed or size • Assembly can take a long time (slow disk speed) • Hypothetical solution • Optimize the file formats • Ditch the built-in index • Keep data sorted by aligned location

Scaling Up: Disk Footprint

Scaling Up: Memory Footprint • Current situation: storing the entire human genome stored with all associated hash locations • Optimized hash table ≈ 55 GB RAM • File-based hash table (BerkeleyDB) • User selects how much RAM to use • Dreadfully slow performance • Large disk footprint ≈ 65 GB file

Scaling Up: Memory Footprint

Scaling Up: Speed & Sensitivity • Current situation: speed increases as the hash size increases, sensitivity decreases • Hypothetical solution: use small hash sizes and require a clustering of a predefined length. • Status: Implemented but not tested.

BORK! BORK! BORK! (translated: when will MOSAIK get published?)

Acknowledgements Boston College Gabor Marth Derek Barnett Michele Busby Weichun Huang Aaron Quinlan Chip Stewart Thomas Seyfried Mike Kiebish Washington University School of Medicine Elaine Mardis Jarret Glasscock Vincent Magrini Agencourt Douglas Smith Wei Tao

Design Goals

Design Goals

Presentation Transcript

Overall Column Design Goals

LAN Design Goals

Chapter 7 System Design: Addressing Design Goals

Design Goals

Game Design Setting Goals

Design Goals & Design Methods

CMU Design Goals

Design goals

Design Goals & Design Methods

STT Design Goals

Glider design goals

DESIGN GOALS

Chapter 7, System Design: Addressing Design Goals

Chapter 6 System Design: Addressing Design Goals

Design Project Experience: Goals

Pasture Design… goals

XML Design Goals

Goals of Workstation Design

Chapter 7 System Design: Addressing Design Goals

Chapter 7 System Design: Addressing Design Goals

System Design: Addressing Design Goals

Chapter 6 System Design: Addressing Design Goals

Design Goals

Design Goals

Presentation Transcript

Overall Column Design Goals

LAN Design Goals

Chapter 7 System Design: Addressing Design Goals

Design Goals

Game Design Setting Goals

Design Goals &amp; Design Methods

CMU Design Goals

Design goals

Design Goals &amp; Design Methods

STT Design Goals

Glider design goals

DESIGN GOALS

Chapter 7, System Design: Addressing Design Goals

Chapter 6 System Design: Addressing Design Goals

Design Project Experience: Goals

Pasture Design… goals

XML Design Goals

Goals of Workstation Design

Chapter 7 System Design: Addressing Design Goals

Chapter 7 System Design: Addressing Design Goals

System Design: Addressing Design Goals

Chapter 6 System Design: Addressing Design Goals

Design Goals & Design Methods

Design Goals & Design Methods